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I. INTRODUCTION 


This final report on project NAS 2-5643, Study of 
Sequential Decoding consists of two main portions: results 

of Phase I and II of our research. Covered are results ob- 
tained in the period September 1970 through January 1972. 
Earlier work concerning September 1969 through August 1970 
is contained in the Annual Report of September 1970. 

Phase I deals with problems of reliable transmission 
through noisy space channels and is subdivided into nine 
areas reported on in Chapter II. (see Table of Contents). 

Phase II of the project deals with problems of en- 
coding of space sources for the purpose of data compression. 
It is subdivided into four areas that are reported in 
Chapter III. 

Chapter IV lists the theses, publications, and talks 
that were based on work supported by this project. 

A substantial portion of this report has already been 
presented in Quarterly Progress Reports 5 through 8. 
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II. REPORT ON PHASE I 


II-A. Theoretical Performance Curves for Bootstrap 
Sequential Decoding 

L U 

We have evaluated r BO ot^ and R BOOT^ vs> 10 log E b/ N o P erfor " 
mance curves of quaternary and octal quantized Gaussian channels with binary 
antipodal inputs. denotes the energy per information bit. As previously, 
the rates given do not include the degradation factor — — corresponding to the 
single parity algebraic code. Each of the curves includes parameter values K 
denoting the least number of streams for which the former are valid (for m <K, 
better performance is obtainable). The performance curves were obtained for 
uniform quantization at the receiver, whose intervals were optimized with the 
help of Figures 1 through 7. The latter are parametric curves (with respect to 
a fixed SNR) showing the performance as a function of varying quantization size. 
It is interesting to note that in each figure, the optimal quantization size (in 

fractions of E /N ) is almost invariant to any changes in the value of E /N . 

D ° b o 

Figures 1 through 5 deal with ^qoT^ and corres P ond to the following 
cases: Quaternary channel with a binary state stream (1) and with a full (quat- 
ernary) state stream (2), Octal channel with a binary state stream (3), with a 
quaternary state stream (4), and with a full (octal) state stream (5). In case 
(4) the quaternary state stream was obtained by lumping together the three 
neighboring output digits that correspond to the extreme quantization values 
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on each side of the 0 point (this is the optimal lumping procedure). 

Figures 6 and 7 deal with (1) for the quaternary (6) and octal 

BOOT 

(7) channels with a binary state stream. 

Figures 8 through 12 give then the (1) vs. 10 log E /N relation- 

1300 1 bo 

ship for optimal uniform quantization at the receiver. All these curves con- 
tain parametric indications of E ^/N (dB) performance, where E^ is the energy 
per transmitted bit. Also shown are the previously mentioned K-limits. Fig- 
ure 8 compares the performance of Bootstrap Hybrid Decoding for binary, 
quaternary, and octal quantization with full channel state streams. It can be 
seen that in the limit of low rates, quaternary quantization constitutes an im- 
provement of about 1. 35 dB over binary, and octal quantization constitutes a 
0.35 dB improvement over quaternary. Figure 9 shows the same relationships 
for a binary state stream. There, quaternary quantization is 1.4 dB better 


than binary, and octal is 0.4 dB better than quaternary. 

L, 

Figure 10 contains R , R T -,^ / ~ rn (l), and capacity curves for binary 

comp BOOT r 1 ’ 

quantization. In the limit of low rates, bootstrap decoding has a 1. 7 dB advant- 

age over sequential decoding (the degradation factor — — is not included). 

m 

L 

Figure 11 contains ^ com p* ^"BOOT^ an< ^ ca P ac ^y curves for quaternary quanti- 
zation. It is seen that a full state stream enjoys a noticeable advantage over a 
binary one only for rates R > l/4. Furthermore, this advantage is always 
small (at most 0.15 dB). Again, in the limit of small rates, bootstrap decoding 


is about 1.7 dB better than sequential decoding. Figure 12 concerns octal 


quantization. An octal state stream is nowhere noticeably better than a quat- 
ernary one, and a binary state stream is worse than the latter only for R > l/4. 
The 1.7 dB advantage over sequential decoding is again evident. 
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Next, Figure 13 compares R goOT^ performances of binary, quaternary, 
and octal quantization with a binary channel state stream. The curves have a 


slight upward slope for low rates, indicating that the upper bound tightens as 
the rates decrease. In fact, comparison with Figure 9 shows that the R gQOT^ 


and R T , < ~,~ rT ,(l) limits are the same.' 
BOO 1 


It should be noted that Figures 10 through 12 show a consistent 1.1 dB 
capacity over R g 0 OT^ a< *vantage. This shows that worthwhile improvement 
might be obtainable from use of more sophisticated algebraic "outer" codes. 


The final six curves (Figures 14 through 19) pertaining to this section 

show the Pareto exponent as a function of SNR per transmitted bit (in dB) for 

Bootstrap and straight sequential decoding at fixed track rate R = j (the 

degradation factor is not included), v denotes the exponent obtainable 

'upper r 

from the upper bound and Y lower that from the lower bound on bootstrap de- 
coding. Finally, a(°°) denotes the exponent for sequential decoding. 

All the curves show that y and y , approach each other with 

upper 'lower 

increasing SNR, and pull away from a(°°). It is interesting to note (compare 
Figures 14, 15, and 17 and Figures 16 and 18) that performance is not improved 
too much as the output quantization increases, provided the alphabet of the 
channel state stream stays constant. However, for a large signal— to— noise 
ratio, the performance of the quaternary bootstrap scheme with a quaternary 
state stream is better than that achievable for an octal bootstrap scheme with 
a binary state stream] In general, the improvement obtainable from an in- 
crease in the state stream alphabet increases with the SNR. 
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II-B. Development of Programs for Simulation of Bootstrap 
Sequential Decoding 

1. While Inspecting some simulation results, we noticed 
that with the systematic convolutional code being used, the 
bootstrap decoder commits a considerable number of decoding 
errors. We have therefore adjusted both the rudimentary and 
the pull-up decoders so that they insert these errors into 
the state stream and continue decoding (instead of stopping 
as previously). Selective simulations suggested the 
following conclusions: 

The errors committed by the rudimentary scheme occur mostly in the 
tails (hence longer tail length than 2 5 seems definitely indicated). When these 
are inserted into the state stream, the decoder is able to finish the entire 
block at the price of inserting into some decoded stream those errors that 
are forced by the parity relationship. 

The pull-up scheme works at a lower SNR. When the errors committed 

on stream J are inserted into the state stream, the parity forces them into 

th 

some stream K. The decoder is then capable of decoding all but the J and 
th 

K streams, and is not able to continue the decoding of the latter within a 
reasonable number of steps. This suggests that the errors can again be 
eliminated at the end simply by re-decoding the and streams from their 

beginning. The simulation results reported in the next 
section are based on programs that incorporate the above 
changes . 
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2. Fortran versions of the rudimentary and pull-up bootstrap 
hybrid decoders based on the Fano algorithm were debugged. 
Simulation indicates that these decoders examine on the average 
four times the number of nodes examined by the corresponding 
stack-based algorithm. The results reported in the next sec- 
tion are based on Fano sequential decoders. 

3. A bootstrap algorithm was constructed that is suitable 
for decoding of channels with binary inputs and quaternary 
or octal outputs. These channels arise from optimal equal 
level quantization of Gaussian additive noise channels. The 
program has a preamble that computes the channel transition 
probabilities corresponding to that quantization, as a 
function of a supplied SNR in dB. The bootstrapping 
algorithm utilizes a binary channel state stream. The next 
section reports simulation results based on this program. 

4. A generalization of our original bootstrap algorithm 
was constructed that is suitable for decoding of channels 
with binary inputs and quarternary or octal outputs. The 
algorithm has a full output alphabet state stream. The pro- 
gram has a preamble that computes the channel transition 
probabilities corresponding to optimal uniform 4 to 8 level 
output quantitization of a Gaussian additive noise channel 
as a function of a supplied SNR value in dB. Theoretical 
curves of Section II-A indicate that the dB gain arising 
from this refinement will be only a moderate one. Neverthe- 
less, a strategy employing the refinement in case the binary 
state stream algorithm runs into trouble might be well 
worth considering. 



5. An algorithm was de-bugged which uses a three-group 
algebraic outer code with a convolutional inner code. The 
operation of the algebraic part of the algorithm is described 


in Section II-F. 



II- C. Simulated Performance of Bootstrap Sequential Decoding 


L. B. Hofman used the various algorithmic techniques developed 
under this contract to construct programs simulating the performance of 
the Bootstrap Sequential Decoding Algorithm. He summarized his results 
in the paper "Performance Results for a Hybrid Coding System" that he 
presented at the 1971 International Telemetering Conference. This work 


is reproduced below: 



Summary.- Computer simulation studies of the hybrid pull-up bootstrap decoding algorithm 
have been conducted using a constraint length 24, nonsystematic, rate 1/2 convolutional code 
for the symmetric channel with both binary and 8-level quantized outputs. Computational 
performance was used to measure the effect of several decoder parameters and determine 
practical operating constraints. Results reveal that the track length may be reduced to 500 
information bits with small degradation in performance. The optimum number of tracks per 
block was found to be in the range of 7 to 1 1. An effective technique was devised to efficiently 
allocate computational effort and identify reliably decoded data sections. Long simulations 
indicate that a practical bootstrap decoding configuration has a computational performance 
about 1.0 dB better than sequential decoding and an output bit error rate about 2.5X10' 6 near 
the R comp P° int - 

Introduction.— The basic coding dilemma is one of exponentially increasing decoding 
complexity as the theoretical capacity of a communications channel is approached. Hybrid 
coding is a cascade or concatenation of block and/or convolutional codes in an attempt to 
operate near capacity while maintaining a complexity less than that possible with either code 
type alone. This paper presents the results of a study of the hybrid bootstrap coding system of 
Jelinek. 1 This technique is similar to a simple case of the Falconer scheme 2 in that a parity 
relationship between a set of convolutionally encoded data tracks is used to aid in the decoding 
of those portions that are difficult. (An even parity is assumed throughout.) It differs from the 
Falconer scheme, which uses an algebraic relationship to derive directly the most difficult 
portions after a sufficient number of others are decoded, by making use of additional 
probabilistic information contained in the parity relationship. In so doing, each bit of data 
decoded helps to “bootstrap” those remaining. 

After reviewing briefly the functioning of bootstrap decoding, this paper examines the 
computational effect of several decoder parameters and determines a practical range of 
operating values. Detailed performance behavior of such an optimized system is presented and 
compared to simple sequential decoding and Falconer decoding. 

Encoding .— The encoding function is the same for all variations of bootstrap decoding 
described in this paper. (The decoders differ only in the manner in which they utilize 
information that is always available at the receiver.) Basically, m— 1 independent, 
convolutionally encoded “data tracks” are linked together into one “decoding block” by the 
addition of an m-th “parity track.” That is, each bit of the parity track is the modulo-two sum 
of the corresponding bits in the data tracks. Because of the linearity of convolutional codes, this 
parity track is also decodable and, as will be shown, may actually be generated by a 
convolutional encoder. The reader will note that this encoding function is identical to that 
required by the Falconer system for a simple parity check code. 

Actual mechanization of the encoder depends upon several operational considerations. One 
method, which requires m— 1 convolutional encoders, provides natural interleaving of the tracks. 
Data are routed to the encoders for coding and transmission in a “round robin” fashion, with 
the parity bit inserted in its turn by a modulo-two adder. Decoder synchronization for such a 
scheme will be difficult; synchronization and tail-forcing bits must be independent of data 



formating, possibly causing a small data buffering problem at the end of each block. Failure of 
the decoder to complete the decoding of a block results in the loss of a large amount of data, if 
not the entire block. 

An alternative way of mechanizing the encoder requires one convolutional encoder and a 
storage register having the length of a track. The data are encoded and transmitted, one track at 
a time, while the parity track is formed in the storage register. The contents of the storage 
register are then encoded, transmitted, and reset following the last data track of each block. 
Although this scheme does not provide interleaving and causes an even larger buffering problem 
while the parity track is transmitted, it does offer several advantages. It is possible to let data 
formating correspond to individual data tracks. Code synchronization can be performed easily 
on a track basis, with block synchronization derived from identification bits embedded in the 
data tracks. In addition, a decoder failure will not necessarily result in the loss of a full block of 
data. Finally, since data formating, synchronization, and tail-forcing can be related, the rate loss 
for these functions can be reduced. 

Rudimentary Bootstrap Decoding.— Bootstrap decoding is applicable to all symmetrical 
binary input channels. For the purposes of this paper, a simplified description of the 
“rudimentary” algorithm for the binary symmetric channel (BSC) is given following the outline 
used by Jelinek. 1 

After the encoded data have been received and are synchronized, the bits of a block are 
grouped into m tracks, and an additional track, the “channel state stream,” is formed by the 
decoder. Each channel state stream bit is the modulo-two sum of corresponding bits in the 
parity and data tracks. The channel state stream differs from the parity track because it includes 
the parity track and is formed after the transmitted sequence is corrupted by noise. Therefore, a 
“zero” in this track indicates that an even number of errors was received at a given position, and 
a “one” indicates an odd number. 

The probabilities that k bits which are independently transmitted through a BSC of 
crossover probability p will be received with an even or odd number of errors are given by 

q k (0)= [1 +(1 -2p) k ]/2 
q k (l) = [1 - (1 - 2p) k ]/2 

The information is used to form an augmented transition probability matrix w m (y,z/x) 
where y is the received bit and z is the channel state bit associated with y and formed 
over m tracks, given that x was transmitted. Thus: 

w m (0,0/0) = w m ( 1 ,0/ 1 ) = ( 1 - p)q m ., (0) 
w m (0, 1/0) = w m (l,l/l) = (1 - P)q m -, ( 1 ) 
w m ( 1,0/0) = w m (0,0/l)= p q m -i (1) 

w m ( 1,1/0) = w m (0, 1 / 1 ) = p q m -i(°) 

It is natural to use these augmented transition probabilities in forming the bit likelihood 
function for sequential decoding. The function is 

= log 2 [w m (y,z/x)/w m (y,z)] -R 

where 

w m (y,z)= [w m (y,z/0) + w m (y,z/ 1)]/2 = [q m (z)]/2 
and R is the bias factor. 

From this starting point, the development of the rudimentary bootstrap decoding algorithm 
follows directly. The first of the m tracks is sequentially decoded using the channel state 
stream and likelihood values defined above. If, after a preassigned amount of effort, decoding of 
this track is not completed, restart values are saved. This step is repeated on successive tracks, 



looping back to the first track if necessary, until decoding of one is completed. At this time, the 
received sequence for the completely decoded track is replaced by the newly estimated 
sequence, and the channel state stream is recomputed. If the decoding was error free, then the 
new channel state stream values represent an even or odd number of errors in the m-1 
remaining tracks, as before. The entire process is repeated, excluding the decoded track, now 
using likelihood values for m— 1 tracks. When a second track is completed, its received sequence 
is replaced, and the channel state stream is again updated. 

The pattern is now obvious, and the process is repeated until all tracks have been decoded or 
the total work exceeds a maximum amount. It would be possible to derive the last remaining 
track, on the basis of the parity relationship, when m— 1 tracks have been decoded. Indeed, this 
is the principle of Falconer decoding; but it is actually simpler to decode this track, too, since 
the decoding requires exactly one computation per bit. This fact, and the general effect of using 
the channel state stream, may be seen in the sample likelihood table shown in figure 1. When 
many tracks are undecoded, the channel state bit gives little additional information about the 
probability of error in a single received bit. Therefore, for large k, the likelihood values for 
bootstrap decoding approach the usual values for sequential decoding, depending mainly upon 
agreement or disagreement between the received bit and the hypothesis. At the other extreme, 
for small k, the channel state bit has a large influence. For example, if two tracks remain 
undecoded (k = 2) and the channel state bit is “one,” neither hypothesis is reliable because the 
probability is 0.5 that the received bit is in error. On the other hand, great reliance is placed on 
the correctness of the received bit when the channel state bit is “zero” since the probability of a 
double error is small. When k = 1 , the knowledge that the received bit is in error for a channel 
state bit “one” and correct for a “zero” is reflected in the table by a likelihood value for the 
impossible hypothesis and 1 .0 for the correct hypothesis. 

Pull-Up Algorithm.- The primary worth of the rudimentary algorithm is the description of 
the bootstrapping process and simplification of its analysis. Practical use of the rudimentary 
algorithm is probably limited because one rather simple modification substantially increases the 
power of the decoder. In the modified algorithm, called the “pull-up” algorithm, the decoder 
does not wait until a track is decoded completely before updating the channel state stream. It 
operates instead on a single track until the track is completed or a difficult-to-decode section is 
sensed, at which time decoding is stopped. The completed track is handled as in the 
rudimentary algorithm. Before proceeding with the next track after a track is terminated, 
however, the decoder declares that portion which it deems reliable to be “definitely decoded.” 
In doing so, it updates the channel state stream and prepares restart values so that the next 
decoding attempt on the track will begin immediately after the definitely decoded section. 

Since it is possible to have all tracks in varying stages of completion, to obtain the most 
effective use of the channel state stream it is necessary to indicate how many tracks remain 
undecoded at a given node. This is done with a vector, KLEFT, the length of a track, which the 
decoder references to determine the likelihood values to use at a given node. At the outset, all 
KLEFT values are set to m, the number of tracks in a block, and are adjusted accordingly as 
individual tracks are “pulled up.” Note that it is necessary each time to start decoding from the 
“Origin” because the state stream may change from the time the decoder terminates a track to 
the time the decoder restarts it. 

Computer Simulations.- Many variables affect the performance and practicality of a system 
as complex as bootstrap decoding. Unfortunately, analysis can give only bounds on performance 
for simplified and idealized conditions. Therefore, simulations have been performed to 
determine the gross effect of a number of parameters for the pull-up version and to obtain 
performance figures for a quasi-optimized system that could be considered for possible deep 
space application. 

The simulation program was written in FORTRAN for a 24-bit, 1.75 /as/cycle computer with 
in-line assembly language used to optimize the critical loops. The convolutional code was 
restricted to the rate 1/2, constraint length 24 complementary code (taps 51202215 and 



66575563) found by Bahl and Jelinek . 3 This code was selected because it could be simulated 
within a single computer word and is sufficiently powerful (free distance 24, minimum distance 
10) that decoder errors do not limit the system. The Fano algorithm was used for sequential 
decoding with a simulation speed in excess of 3000 computations per second. All simulations 
were run with the bias factor R = 0.5 and the threshold spacing = 3.0. One-dimensional 
parameter studies of this system using the BSC concern track length, tracks per block, stopping 
rule, and reliability criterion. These tests were run at low signal energy per information bit per 
noise power spectral density (E 5 /N 0 ) values so that the effects of the parameters could be 
observed near threshold-of-operation conditions. 

Track Length.- It is possible (perhaps desirable from a theoretical point of view) that the 
track length be very long for the pull-up algorithm. Other practical considerations, such as 
synchronization, formating, and buffering, require that the track length be reasonably short. 
Figure 2 shows the effect of track length on computation performance, with the number of 
tracks fixed at 7. All tracks for all simulations are terminated by a one-constraint-length tail that 
is included in the rate loss for the code. The value of E 5 /N 0 was fixed at 3.43 dB, and for 
direct comparison, the computation distributions per block were normalized per information bit 
before being plotted. Average computations are shown in the legend. It can be seen that the 
computation performance is degraded for a track length of 300 information bits, but that little 
improvement is actually obtained for a length beyond 500. The track length was fixed at 500 
information bits for all other simulations. 

Tracks per Block— The rate loss of bootstrap decoding, determined by the number of tracks 
per block, m, is significant for small m but it decreases rapidly and then changes relatively little 
as m becomes large. This fact, and the fact that the effect of the channel state stream is 
predominant when the number of tracks is small, suggests that an optimum value for m can be 
found. In addition, the value of m has a direct influence on formating, encoder complexity, 
and decoder buffering. Simulations were conducted to determine the effect of m on the 
computation distributions. The track length was fixed at 500 information bits (plus the 
one-constraint-length tail), and all simulations were carried out for a value of Eb/N 0 held 
constant at 3.43 dB. Figure 3 is a summary of the results of these simulations, with distributions 
of normalized computations per information bit plotted for selected values of m and a more 
complete table of average computations per information bit given in the legend. The irregular 
variation in computations between values of m is probably due to small sample size 
(3.675X1 0 6 bits per value of m), but a broad minimum is indicated between m = 7 and 11. 
Values of m in this range would be practical for operational use. The number of tracks per 
block was fixed at 7 for all other simulations. 

Stopping Rule. — An effective stopping rule must be devised in order to obtain the maximum 
efficiency of pull-up bootstrap decoding. The sequential decoder should be allowed to operate 
as far as it can go easily. Unnecessary time is wasted in restarting when a track is stopped too 
soon, or in computing, when it is not stopped soon enough. In addition, the stopping rule 
should provide information about the reliability of the path on which the decoder is operating. 
Several rules based upon limiting the number of computations per track were devised and 
tested, but none proved very useful because of the large variation in the number of 
computations for each track. When the computations limit is set low and increased when no 
progress has been made on any track, many decoding attempts are required to complete each 
block. Setting the initial limit high to reduce the number of attempts caused long unnecessary 
searches. In addition, computations alone do not provide reliability information. 

The final and most effective rule devised is based solely on observation of the path likelihood 
value. Since the likelihood of the correct path tends to increase with depth in the code tree, the 
rule allows the decoder to operate as long as a drop in the value of the likelihood does not 
exceed a specified value, D. Mechanization of this rule also gives the needed reliability 
information. The decoder keeps track of the maximum likelihood value, L max , of any path 
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visited. Operation is stopped if the decoder attempts to lower the threshold more 
than D below L ma x. At this time, the decoder is pointing to a node before the L m ax node 
which has a path likelihood approximately D below it. The probability that this node is on the 
correct path increases with increasing values of D. The definitely decoded section is declared to 
extend from the starting point up to LBACK nodes from the stopping point, where LBACK is 
another variable in the stopping rule. 

In order to sense stagnation in the decoding process, it is necessary to count the times the 
definitely decoded section is not increased by NPULL nodes for a single decoding attempt. For 
all simulations, NPULL was set to 1 5. The counter, KROUND, is initially set to 0 and reset each 
time that decoding results in more than NPULL definitely decoded nodes. If the KROUND 
count becomes equal to the number of undecoded tracks, thus indicating that no progress can 
be made on any track, the value of D is increased and KROUND reset. At this time, the 
channel state stream is recomputed and decoding is begun from the first node of each 
uncompleted track. This procedure allows for correction of possible errors included in definitely 
decoded sections of the incomplete tracks which may be causing the decoding difficulty. The 
value of D is reset to its initial value each time a track is completely decoded. 

Figures 4 and 5 show the results of simulations for the above scheme. All simulations are for 
500 information bits per track, 7 tracks per block, and Eb/No = 3.43 dB. For these simulations, 
D is determined by multiplying the indicated stop factor by the “disagree, 0 state bit” 
likelihood value for the number of existing uncompleted tracks. Figure 4 shows the effect of 
several stop factor sequences with LBACK = 50, and figure 5 shows the effect of LBACK using 
only the 4, 5, 6, 7 stop factor sequence. It can be seen that an initial stop factor of 3 or 4 
is optimum with an increase of 1 each time stagnation occurs. For these values the stopping 
point does not usually contain errors, and LBACK may be small. 

Performance of Optimum System.- Figure 6 shows the performance of a pull-up bootstrap 
decoder for the BSC. System parameters, chosen near the optimum values determined in the 
previous simulations, were held fixed over the Eb/No range. Although no further attempt was 
made to optimize the system, these curves provide a good measure for comparison with other 
systems. The Pareto slope, a, is plotted as a function of Eb/N 0 in figure 7. The R C omp point is 
interpolated to be 3.1 dB. During these simulations 62 decoder errors were observed for the 
3.43 dB case. The resulting output bit error rate was about 5X 10‘ 6 . 

It is worthwhile to note here that the power of the code and stopping rule worked very 
effectively in eliminating decoder errors. Numerous errors were inserted in partially completed 
tracks but were removed when the tracks were eventually restarted. The 62 errors occurred in 
one block; 31 were decoded into the second track to be completed and the other 31 were 
forced by parity into the last track. (Weaker codes have been observed to permit more frequent 
errors, which were also duplicated in a second track with no significant effect on the 
computation performance.) 

Figure 8 shows pertinent information about decoder operation for one block of the 
Eb/No = 3.43 dB sample. This block was selected because it shows the decoder trying to 
commit errors (step 7), a change in stopping rule (step 1 8), the effect of pull-up, and the general 
reduction in computations per track as the quantity of definitely decoded data is increased 
(when there are no errors). The step number is KTRY; JNOW is the track being operated on; 
ITCT is the number of computations for the step; IT is the stopping threshold value; ITMX is 
the maximum threshold value; DFAC is the stopping rule likelihood drop factor; NSTART is 
the starting node; N is the stopping node; NMAX is the maximum node depth; KLEFT is the 
number of uncompleted tracks after the decode step; and KROUND is the number of steps 
since pull-up. 

Figure 9 is a plot of the probability that the total number of decoder steps per block will 
exceed a given number for the optimum system with Eb/N 0 as a parameter. Note that these 
curves exhibit a Pareto-type distribution with a sharp change in slope near the Rcomp point of 
the system. 



Comparison With Other Systems. — It is interesting to compare bootstrap decoding with two 
other decoding techniques because of their similarities. The first is simple sequential decoding. 
To provide a means for direct comparison, simulations were performed for the same Eb/No 
values as were used for the bootstrap decoder. The same Fano algorithm, track size, and rate 1/2 
convolutional code were used. The results are shown in the normalized computations curves of 
figure 10 with the Pareto slope values plotted in figure 7. Rcomp is at approximately 4.6 dB. 
Bootstrap decoding has a gain of about 1.5 dB over simple sequential decoding. 

In order to determine the exact effect of the channel state stream, the pull-up decoder was 
modified to use standard likelihood values when k ranges from 2 to 7 so the channel state 
stream is useless, except to pull up the track which is farthest behind the others. Consequently, 
the algorithm actually behaves like the Falconer algorithm for a 7-bit parity check code, with 
the exception that the decoder is restarted from the first undecoded node at each decoding 
attempt. The computation results of these simulations are shown in figure 1 1 with the Pareto 
slope values plotted in figure 7. This algorithm has an Rcomp of about 4.1 dB which is only 
0.5 dB better than simple sequential decoding. The use of the channel state stream therefore 
yields a rather inexpensive 1 .0 dB gain. 

Extension to Quantized Channel. - Bootstrap decoding would be of little use if it were 
applicable only to binary output channels since nearly 2 dB can be gained for simple sequential 
decoding if the output is quantized to eight levels. Jelinek has provided such an extension for 
the bootstrap decoder. 1 Unfortunately, to make full use of the information provided by the 
quantized symbols, a large amount of time is required to compute channel state values, which 
are no longer binary. Excessive computing time, coupled with the large likelihood tables 
required (15,280 entries for 8-level quantization and 7 tracks), probably makes such a scheme 
impractical. 

Fortunately, there is a compromise - to use the quantized values of the track symbols and 
maintain only a binary channel state stream. If the receiver outputs are broken into sign and 
quality bits, u and v, then the channel state values, z, are modulo-two sums of u, as before. 
Then, 

*m = log 2 [w m (u,v,z/x)/w m (u,v,z)] - R 

where 

Wffl(u,v,z) = [w m (u,v,z/0)+ w m (u,v,z/l)]/2 

and 

w m(0,v,0/0) = w m (l,v,0,l) = w(0,v/0)q m _j (0) 
w m (0,v, 1 /0) = w m ( 1 ,v, 1 / 1 ) = w(0,v/0)q m _, ( 1 ) 
w m ( 1 ,v,0/0) = w m (0,v,0/ 1 ) = w( 1 ,v/0)q m _, ( 1 ) 
w m ( 1 ,v,l/0) = w m (0,v, 1 / 1 ) = w( 1 ,v/0)q m _j (0) 

q k (z) is defined as before, and 


p = 2 w(l,v/0) 
v 

According to theoretical bounds derived by Jelinek, 1 full use of the 8-level channel gives an 
additional gain of about 1.7 dB over the BSC for rate 1/2 bootstrap decoding. Using a binary 
state stream for this channel causes a theoretical degradation of only 0. 1 dB, which is a small 
price to pay since the channel state computation and likelihood look-up are direct and the table 
size is only four times larger than for the BSC. 

The simulation program was modified for the quantized channel with binary state stream with 
no significant change in speed. Simulations were performed for eight levels of output with 



quantization spacing of 0.5 a for all E^/K Q . Tests were conducted which determined the 
optimum values for the stop factor sequence to be 2.0, 2.5, 3-0, 3.5 times the “strongest 
disagree, 0 state bit” likelihood with LBACK = 10, 7 tracks, 500 information bits per track, and 
Eb/N 0 = 1.91 dB. Extensive computer runs were made under these conditions for a range of 
Eb/N 0 values. The resulting computation performance curves are shown in figure 12. The 
observed Pareto slopes are plotted in figure 7 for comparison with the other simulations. The 
'interpolated R C omp P°* nt I s at 1-7 dB, a gain of 1.4 dB over the BSC and 1.0 dB better than 
fate 1/2 sequential decoding using the octal channel. Figure 7 also shows an interesting 
thresholding effect for the codes plotted — the threshold is approached more sharply as code 
power increases. Over 27,000 blocks were run for the 1.91 dB case (near the threshold of 
operation) in order to look for any peculiar deviation in computations performance for low 
probabilities of C > T. The Pareto slope remained constant over the significant range. For this 
case, 190 bit errors were observed in 4 blocks for a probability of bit error less than 2.5X1CT 6 . 

Conclusions. — Simulations have provided a great deal of experience with the bootstrap 
decoding algorithm. Although a number of questions remain unanswered (e.g., effects of 
channel memory and likelihood/channel mismatch), it is clear that this technique offers a gain 
of about 1.0 dB over that obtainable from sequential decoding alone. Bootstrap decoding has 
been shown to operate under the constraints imposed by digital communication systems, such 
as those typical of deep space. A bootstrap decoding system would be relatively complex, but 
appears suitable for low-to-moderate data rates where the value of 1.0 dB is worth the cost of 
implementation. 
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Figure Captions 

Fig. 1 — Likelihood values X^ for p = 0.09 and R = 0.0. 

Fig. 2 - Pull-up decoder computations performance as a function of track length. 

Fig. 3 - Pull-up decoder computations performance as a function of tracks per block. 

Fig. 4 - Pull-up decoder computations performance as a function of stop factor. 

Fig. 5 - Pull-up decoder computations performance as a function of LBACK. 

Fig 6 - Optimized pull-up decoder computations performance for the BSC as a function of 

E b /N 0 . 

Fig. 7 - Pareto exponent vs. E b /N 0 for several decoding techniques. 

Fig. 8 - Sample program output. 

Fig. 9 - Probability that the number of decode steps will exceed K for the optimized 

pull-up decoder as a function of E^/Nq. 


Fig 10 -Simple sequential decoder computations performance for the BSC as a function of 

E b /N 0 . 
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Fig. 1 1 - Pseudo Falconer decoder computations performance for the BSC as a function of 


E b /N 0 - 


Fig. 1 2 - Pull-up decoder computations performance for the octal channel as a function of 


E b /N 0 - 
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II-D. Effect of Likelihood Bias on Sequential Decoding Parameters 
1. Introduction 


The performance of sequential decoding has traditionally been 
evaluated in terms of three characteristics: the probability of undetectable 
error ( [ 1 ], p. 349), the probability of failure of order t([l], p. 349), and 
the Pareto exponent associated with the decoding effort ([1], p. 349). Most 
published bounds on these quantities assume that the decoder uses the 
likelihood metric 

log^> - R (1) 

6 w(y) 

where R is the rate of the code used, w(y/x) is the channel transmission 
probability function, and w(y) is the marginal probability distribution of 
received digits based on the optimal code ensemble. It is generally known 
[1] that the three quantities of interest are optimized by the metric form 


, w(v/x) _ 

log ^r - G 


( 2 ) 


where the optimal value of G may be different for each of the three cases. 
For instance, Zigangirov [2] manipulates G to minimize the probability 
of failure, and Stiglitz and Yudkin explore some effects of G -variation in 
an unpublished memorandum [3], However, their use of simplifying 
inequalities at certain critical points of their development prevents them 
from obtaining the strongest achievable results. 

The trade-off between the three performance parameters is interesting 
from the point of view of Bootstrap Hybrid Decoding [4], In one mode of 
the pull-up version of the algorithm, digits of branch depth J-t and less are 
definitely decoded if the deepest penetration of the decoder was tobranch level J. 



Making the retreat length t as short as possible will tend to decrease the 
decoding effort as long as no error at depth J-t or less was committed. 
Otherwise the definite decision will have possibly catastrophic 
consequences. Hence all other things being equal, G should be adjusted 
so as to minimize the probability of failure. We will see below that 
usually such setting will lower somewhat the Pareto exponent of the 
sequential decoding component of the scheme, and will increase the 
probability of undetectable error. The latter difficulty may be cheaply 
remedied by an increase in constraint length, but what the best compromis 
is between the failure and Pareto exponent parameters remains an open 
question. 

A second mode of the pull-up version of Bootstrap Decoding 
definitely decodes digits by the following rule: Let the decoder be located 

at some node whose likelihood is L and let the path leading to that node 
contain some node n* at depth t whose cumulative likelihood does not 
exceed L-a. Then the decoder will definitely decide to release to the 
user all t branches of the path leading to n*. How to set the value of the 
likelihood drop a depends on Q(a), the probability that with zero likelihood 
value assigned to a root node, there exists a node in the incorrect subset 
whose cumulative likelihood exceeds a. Q(a) is thus a fourth performance 
parameter of interest. 

This paper attempts to determine the effects of G-variations on the 
four performance characteristics. In sections 2 through 5 we deal with 
random coding upper bounds. In sections 6 and 7 we develop expurgated 



bounds for the probabilities of failure and undetected error. We show 
that the former is identical to the one developed by Viterbi and Odenwalder 
[lOj for maximum likelihood decoding, and that the latter leads to the 
block coding expurgated exponent. In section 8 we present some curves 
that apply our results to quantized Gaussian additive noise channels with 
binary inputs. 



2. Definitions and Basic Upper Bounds 


As is usual, we will work with the random coding ensemble and we 

will not bother to argue that the obtained bounds are simultaneously 

valid for particular codes as well. To save space, we will use the 

notation and some of the intermediate results from Chapter 10 of 

Jelinek [ 1 ]. However, to simplify matters further, we will adopt the 

stack sequential decoding algorithm [5] that leads asymptotically to the 

same results as the Fano algorithm. The reader will be assumed 

familiar with both. Our random codes of rate — will have the trellis 

n 

structure of Figure 1 (see also p. 336 of [l]) with 2 k branches leaving 
each node, each branch associated with a block of n channel input digits 
x (in Figure 1, k = 1 and n = 2, and the channel input alphabet is binary). 
Each level of the trellis will contain 2 ^ u “^ states, where u is called the 
branch constraint length of the code. The information digits that determine 
the path thai the encoder takes through the trellis are binary, the state 
being determined uniquely by k(u-l) most recent bits (by convention, 
the information preceding time t = 0 is assumed to consist of 0’s). In 
the random ensemble, each digit of each branch of the trellis is 
selected independently, at random, with some probability distribution 
r(x) over the channel inputs. The coding trellis generates a coding tree * 
whose root node corresponds to the initial all-zero trellis state. In 
this paper we will consider infinite depth trees and trellises. 

An undetectable error is committed at depth i by a sequential decoder 
if, after it operated without any restriction on the number and depth of 



returns, the i branch on the finally decoded trellis path differs from 
the one actually taken by the encoder. We will be interested in U(u), 
the average number of undetectable errors per decoded digits when a 
random code of constraint length u was used. 

A failure of order t takes place if the decoder advances by t 
branches or more into the incorrect subset of the coding tree. We will 
be interested in the probability of failure P^(t). 

Let be the number of times the sequential decoder is located at 
some node of the incorrect subset stemming from a correct node on 
level i. Then • is the y moment of the decoding effort at depth i. 

Let a be the supremum of the values y for which N.V is bounded, a 
is then called the Pareto exponent of the decoding effort. 

In the preceding section we have defined Q(a), the probability of an 
a-likelihood advance in the incorrect subset. 

Let s^ = (Sj, s 2 , . . . ) denote some path in the tree determined by 

information digits s., i = 1, 2, . . . , let sf be the correct path and let 

x (s) denote the code digits corresponding to the initial t branches of s. 
t 

Let V denote the set of nodes at depth t of the incorrect subset stemming 
from the root node, and let G t+U be the subset of nodes of p t+u that 
corresponds to trellis paths whose first branch is incorrect and which 
rejoin the correct path for the first time at depth u+t (i.e., these are 
paths containing at most t incorrect information digits). 

The following upper bounds have been proved in Jelinek [l], pp. 
354-359 (we have made some adjustments to assure applicability to the 
stack algorithm) where 0 < a, V : 
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An upper bound on N Y for Y > 1 has also been derived by Jelinek [6], 
However, the purpose of this paper is to investigate the effect of not 
taking certain usual bounding shortcuts which alone make the bound on 

Y~ 

N for Y > 1 tractable. We will therefore restrict ourselves to the 


case Y < 1 which is the one for which optimal choice of G is crucial. An 
adventurous reader may in any case decide to use our conclusions as a 
guide for action in the region y > 1. 

Before bounding Q(a) let us observe that (4) and (5) have the 
expectation term in common and that the expectation term in (3) is 
similar. In Appendix I we have bounded these as follows. 

Define the exponent functions 
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^ 2 n[(m-t)fj(cP/) + tf 2 (CT,Y) + 4(g)YR] 
if m > t 


^[(t-mJf^tc^Y ) + mf ? (a, Y )+ f(8)YR] 
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if m < t 


where B is either equal to V * or to G * and i(P^) = t, £(G ) = t-u. 


( 9 ) 
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3, Random Coding Upper Bounds on Performance Parameters 

In this section we use (9) to obtain upper bounds on U(u), P f (t), and 
N 6 and develop an upper bound on Q(a). Substituting (9) into (3) we get 


00 00 


U(u) < l l t exp^ n ■[ 6a(x»-t-u)G + t&R+ft+uJfgfa, 6)+(m«t -u)fj(cr6)]' 
t=l m=t+u 


t+u- 1 


♦I I t exp 2 n { 6a (m-t-u) G+t 6R+mf 2 (c, 6)+ (u+t-m)f 3 (a, S)} 

( 10 ) 


t=l m= 0 


where 6 e[0, 1], a > 0. Using the geometrical sum formula, the first term 

unf (a, 6) 

in (10) is bounded by Kj2 L where Kj is finite provided 


6 ctG + f j (a6) < 0 
6R+ f 2 (a, 6) < 0 . 


(ID 


It is best to break up the second sum in (10) into two parts, the first for 
m e[0, u- 1] and the second for m e[u, u+t-1]. The first part is then equal 


to 


u-1 


exp 2 nu jf 3 (a, y) - 6 g|- . ^ exp^ nm -jsaG + f2 ( O', 6) - 6) j- . 


m=0 


, t exp 2 nt ^fj(CT, 6) + 6R - o6G | 
t=l 


( 12 ) 


The result then depends on whether the exponent in the second summation 
is positive or negative. Thus the bound is K 2 2 nu ^ cr ' ^ where K 2 is 
finite provided 

6 crG + f 2 (<7, 6) - f 3 (cr, 6) > 0 


f 3 (a, 6) + 6R - ct6G < 0 


(13) 



and it is K 3 2 nu ^3^ CT * ^ w jj ere K 3 is finite provided 

a 6 G + f 2 (a, *>) ~ *3 (o'* 6 ) < 0 

f 3 (a, 5) + 6 R - ct 6 G < 0 . (14) 

The second part of the second sum in (10) is equal to 

00 

exp^nu jf 3 ( 0 , 5) -cSGj-^T exp^nm ^cr 6 G + f 2 (< 7 , ^ “^ 3 (o* 6 ) } • 

m= u 

03 

. ^ t exp 2 nt jf^c, 6 ) + <jR - ct 6 G }• (15) 

t=m-u + 1 

nuf 2 (a, 6) 

which is bounded by 2 provided 

f 3 ( CT , 6 ) + 6 R -<j 6 G <0 

6 R + f 2 (a, 6)<0 . ( 16 ) 

Now the last two constraints of the set 

ct 5G + f j (ct 6 ) <0 (17a) 

6 R + f 2 (a, 6 ) < 0 (17b) 

f 3 (o, 6 ) - f 2 (a, 6 ) -o 6 G < 0 (17c) 

imply the second constraint in (13), so that (17) is equivalent to (11), (13), 
and (16). Similarly, the last two constraints of the set 

c 6 G + fj ( 06 ) < 0 (18a) 

06 G + f 2 (cr, 6 ) -f 3 (oy 8 ) < 0 (18b) 

f 3 (a, 6 ) + 6 R -a 6 G < 0 (18c) 


imply the second constraint of (11), so that (18) is equivalent to (11), (14), 
and (16). We thus get the bound 


U(u) << 


K 2 nuf2^» 5 ) 


£ 2 nu[f 3 (a, 5) -c6G] 
6 


if (17) holds 


if (18) holds 


(19a) 


(19b) 


if a > 0, 6 e[0, 1], where the second exponent was obtained with the help of 
the inequality of (18b). 

Substituting (9) into (4) we get that ifcr>0, 6 e[0, 1] then 
P f (t) < exp 2 nt [f 3 (o-, 6) + 6R -6oG] . 


. Y exp 2 nm [cr6G -f 3 (a, 5) + f 2 (a, 6)] . 


m=0 


Therefore 


/ K^2 n ^*2^ CT ' ^ 6R] 


if (21) holds 


(20a) 


P f (t) < / 


v _nt[f 3 (<7. 6) +6R-6oGl 

{ K e 2 . * 


(21) does not hold (20b) 


where 


-f 3 (a, 6) + f 2 (c 7 , 6) > 0 . (21) 

Substituting finally (9) into (5) we get for c> 0, y e[0, 1] 

CD 00 

nY -Z Z ex P 2 n {ya(m-t)G + (m-t)f 1 (CT6) + tf 2 (cr, y) + ytR} + 

t=0 m=t 


00 oo 

+ Z Z ex P 2 n {vcT(rn-t)G + (t-m)f 3 (a,y) + mf 2 (CT, y)+ytR } (22) 

m=0 t=m+ 1 


The first sum in (22) converges provided 
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YctG + fj(cry) < 0 
YR + f 2 (a, y) < 0 

while the second sum converges provided 
f 3 (<?» Y) + V R YctG < 0 
Y R + f 2 (a, y) <0 . 

We therefore conclude that 


N Y <K 9 (23) 

where is finite if 

YcrG + fj(oY) < 0 (24a) 

YR + f2(a,Y)<0 (24b) 

f 3 (a, y) + Y r -yoG < 0 . (24c) 


We conclude this section by upper bounding Q(a). We do so using a 
difference equation method pioneered by Zigangirov [7]. 

Consider the partial tree of Figure 2 all of whose branches are in the 
incorrect subset, with d = 2 branches leaving all but the first node (in 
Figure 2, k = 2). Let (3 be the cumulative likelihood value of the first 
node and A the likelihood of the branch emanating from it. Let F (0) be 

cl 

the probability that at least one of the nodes of the tree of Figure 2 has a 
cumulative likelihood that exceeds the value a, given that the initial node 
had likelihood 0. F (0) then satisfies the difference equation 

cL 

1 - F a <0) = £ P(A) [1-F a (0 + A)] d (25) 

A 

where P(A) denotes the probability that a branch has likelihood A, and by 
definition 

F (0) = 1 for 0 > a . 
a 


( 26 ) 



6o 


Because ^ P(A) = 1, it follows from (25) that 

F a (p) < d ^ P(A) F a (p + A) . (27) 

A 

Let F* (P) be any function satisfying (27) such that 

F* (P) > 1 for p > a (28) 

then it is well known that [see [8], pp. 281-282] 

F (p) < F* (p) . (29) 

CL 

s r 0 -ci i 

We will try F* (P) = 2 ^ J with s chosen so that (27) is satisfied with 

equality. Thus we desire 

2 sCP- a : = d ^P(A)a sC P + i - a ^ 

or 

l = d£p(A)2 SA . (30) 

A 

Using the metric formula (2) and the fact that d = 2 , (30) becomes 



sG - R - f (1-s) = 0 . 

The relation between Q(a) and F (P) is, of course, 

d» 

Q(a) = 1 - [1-F (0)] d ” 1 < (d-1) F (0) 
a a 

so that 

Q(a) < (d-1) 2“ sa 


(31) 

(32) 

(33) 


where s is the maximum value satisfying (31). 



61 


Optimization of the Random Coding Bounds 

In this section we will choose the various values of G that optimize 


the bounds on U(u), P f (t), N Y , and Q(a). These should be expected to be 
different for the four cases. In the next section we will choose the best 


values of a and 6 for fixed G. 

Our analysis will presuppose a constant value of the source 
distribution r(x). Most channels of interest are symmetrical and for them 
the best r(x) is uniform. For other channels r(x) should be optimized, but 
we will not concern ourselves with this problem (see Chapter 7 of Cl]). In 


fact, in general different distributions r(x) would optimize the bounds on 
P f (t), U(u), Q(a), and N Y 1 

First, consider the bound (19a). Our approach to its optimization is 


to choose for a fixed 6 values of cr and G that will allow satisfaction of 


(17) by the maximum value of R. In this way a parametric relation (in 6 ) 
between R and the exponent - ^,( 0 ( 6 ), 6 ) will be obtained. If an increase 
in R will lead to a decrease in -£^(a(S), 6 ) the bound will be optimized. 

Now R is maximized (see (17b)) by maximizing -f 2 (a, 5) an< * then choosing 
G that would satisfy (17a) and (17c). Straightforward calculus shows that 


the desired value is 

1 

CT " 1+6 


(34) 


so that the choice of G is 

t 5 |^3 (tt5 ' 6 ) -hiih ■ 6 )] 0 5 - ¥ f i (it*) (35 > 

We show in Appendix II that indeed the righthand side of (35) is at 
least as large as the lefthand side. It is interesting to note from (7) that 
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6 ) = - log I 

y 


1 

£ w (y/ x ) 1+6 r(x) 

_x 


1+6 


* E o (6) 


(36) 


■where E q ( 6) is the well known exponent function of Gallager [9]. The 

desired maximal value of R is then 4 E (5). Since 6 is restricted to the 

o o 

range [0, 1], it remains to treat the case of R < E q (1). 

Since the maximum of E q ( 6) for 6 e[0, 1] is E q ( 1), then the exponent 
will be E q ( 1) provided G satisfies (35) with 6=1. 

We must next check if better results cannot be obtained with boupd 
(19b). It follows from (18b) and (18c) that choosing a to maximize 
-^(cr, 6) will allow simultaneous maximization of R and of the exponent 
<j6G -f 3 (ct, 6) provided (18a) can be satisfied. However, (18b) will in any 
case force the exponent of (19b) not to exceed that of (19a). We state our 
result as a theorem. 

Theorem 1 

For R e [ E q (1), C ] and a code of branch constraint length u 
the probability of undetectable error is bounded by 
-nuE (6) 

U(u)<K 5 2 ° {37) 

where 6 e[0, 1 ] is the solution of 

R =i E o( 5 ) (38) 


and K is finite if 
5 


1+6 

6 





( 39 ) 


For R e [0, E 0 (l) ], (37) holds if 6 = 1 and (39) is satisfied. 

It follows from Appendix II that the two extreme sides of (39) ar^: 


equal if and only if 
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I ) r(x) = const for au y 


This is actually the case for the BSC when r(x) is uniform, but is not true 
in general. If (40) holds, then (39) reduces to the "usual" choice (Cl], 


p. 360) 


G =- E (6) . 
6 o 


We show in Appendix II that 

¥^(156* *) +B c <M]< « E o' 6 > < - i T f i(-IT6) « 

so that Theorem 1 constitutes a real strengthening of the previous results 
that provides us with a welcome leeway for choosing G. 

We next turn to the optimization of the bound (20). In (20a), for a 
fixed 7] and R (for reasons that will become apparent in the next section, 
we are using the parameter T| instead of 6) one wishes to select a so as to 
maximize -f^ (cr» T)) and then choose G sufficiently large to satisfy (21). 

This implies that a = and 


Cf 3 • ’O-fzCrh ■ ’O-i • 


As a result of this choice. 


P f (t) < K ? 2 


-nt[E o Cn) - 7]R] 


As is well known, the exponent of (43) is maximized by the value of 7) 
satisfying 


R = E ' (71) . 
o 


( 44 ) 



It is immediately obvious that (20b) is optimized by the same value of a 
and by G satisfying (42) with equality. This choice gives the same 
exponent. We then get the following 
Theorem 2 

For R e[E^(l), C), the probability of failure of branch order t< u is 
bounded by 

-nt[E (T)) - T)R] 

P f (t) < K ? 2 ° 


where T) satisfies (44). K ? is finite provided 
G ^T 1 [E o ( * ,f 3 (t?v ’>)]• 


For R e(0, E^(l)), we choose T] = 1 in both (45) and (46). 


The above theorem shows that if G satisfies (46) then the so called 
random block coding exponent applies to the probability of failure. Again 
if (40) holds, the righthand side of (46) reduces to the usual choice of 
G = y E q (T|) (see [1] p. 361). Because of the left inequality in (42), 
Theorem 2 strengthens the previously published results. 

Our next topic is to maximize the value of R for which N Y is finite 
where y e(0, 1]. It follows from (24c) that G must be made as large as 
(24a) allows. Hence R must satisfy 

Y R < max min j-f 2 (a, y), -f ^cry) -f 3 (cr, v)} • 

CT >0 


But, as already pointed out, -f 2 (ct, y) is maximized by cr = yjy and 
~*2 (t+y' y )- " f l(l+y) " f 3 (l+^ ' Y ) * 


We thus have the following 
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Theorem 3(*1 


For y s(0, 1], N is finite provided 


R < - E ( Y ) 
Y o 17 


^ [ f 3(j^> v)+E o ( Y ,]<G<-to f,^). ,5, 

We see that Theorem 3 represents the same strengthening of the usual 
bound (see [1], p. 363) as Theorem 1 did. In particular the usual choice 

G = Y E o (Y) iS within ^ ra,n S e of the interval (50) that has non-zero length 
whenever (40) does not hold. 

Let us finally consider the bound (33). In Appendix III we have shown 
that f j(\) is a convex function with 

fj(0) = 0, fjd) <0. 

Thus Figure 3 represents the graphical solution to the problem of 
maximizing s* that satisfies (31). As is intuitively obvious, s* is a 
monotonically increasing function of G. s* = 1 for G = R. We summarize 
our conclusions in 
Theorem 4 

The probability Q(a) that the likelihood of some path in the incorrect 
subset exceeds ji is bounded by 

Q(a) <(2 nR -l)2" s * a (51; 

where s* is the maximum of at most two solutions of the equation 

G^CR + fjd-.n. (52J 

* We call the reader's attention to the faqithat Theorem 3 does not imply 

that if (49) holds then the upper bound on N Y is finite only if (50) holds as 

well. When G = R, the conditions (24) reduce with the help of (42) to the 
usual condition (49). * ^ 
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Let 

G + = lim - £. (1-s) . 

. s 1 
s •* 0D 

If R > -f^(l) there is a unique value G~ such that (53) has two positive 
solutions for G e(G~, G + ) and no solution for G < G . If R < -fj(l) then 
exactly one positive solution exists for all G < G*. s* is a monotonically 
increasing function of G e(G , G ). 

The reader should note that Theorems 1, 2, and 3 have a somewhat 
different status than Theorem 4. There is definite practical value in 
setting G so as to minimize P^(t) and U(u) for fixed t and u, and to 
maximize the Pareto exponent for a given rate R. On the other hand, it 
would be foolish to blindly increase G just to minimize the bound on 
Q(a) for fixed a. The latter is an arbitrary parameter which is used to 
determine a back- stop before which decoding information can safely be 
released to the user. One might therefore wish to answer the following 
question. 

Given a prescribed average lag of released information 
behind maximum tree penetration by the decoder (information 
is assumed here to be released in accordance with the rule 
of the next to last paragraph of Section 1), how shall G and a 
be chosen so as to minimize the probability of error Q(a)? 

To answer the above question, note that the expected penetration 
depth in branches necessary to achieve the likelihood increase a is 
given by 



67 


a a 

nE [log^^I -G] ' n[I(XY)-G] 

~ w(y) 

since the denominator is the expected likelihood increase per 
branch. Now from (51), 

- log Q (a) > - nR + s*a = - nR + mn [ s*I (X; Y) - R - f (1-s*) ] 

where the value of a was given by (53) and that of G by (52). 

It follow therefore, that we wish to choose s* so as to maximize 
s*I(X;Y) (1-s*) 

which is equivalent to choosing the largest a* such that 
I(X;Y) = -fj'd-s*) 

But of Theorem III. 1, the desired s*=l, so the best choice is G = R. 

We then get 

Theorem 5 

To minimize the random coding bound on the probability 
of released information error, Q*(m), for a prescribed 
average lag m of released information behind maximum tree 
penetration by the decoder, the bias G should be chosen to 
equal R. Then 

Q*(5i)< (2 nR -l) 2 - SnCl(x;y) - R] 
provided the likelihood decision threshold is set to 
a = mn [ I ( ; ) - R ] 

It is worth noting that because of (38), (39), (42), (49), and (50), 
the choice G = R allows for simultaneous optimization of the bounds on 


(53) 


the Pareto exponent, U(u), and Q*(m). 



5. The Random Coding Bounds for Arbitrary Values of G 


In sequential decoding one ordinarily wishes to choose all parameters 

so as to maximize the Pareto exponent a which determines the amount of 

decoding effort. This is especially true in the range a e(0, 1]. Comparing 

Theorems 1 and 3 we see that the bounds on U(u) and require the same 

optimal choice of G, namely in the range (50). From Theorem 5 we see 

that to minimize Q*(m), G ought to be selected equal to R. To 

minimize P^(t), G ought to be selected within the range (46) whose lower 

limit is formally identical with that of (39) which minimizes U(u). However, 

the values of the parameters 7] and 6 appropriate for (39) and (46) are 

different. For (39), 7] satisfies R = E ' (T|), while for (46), R = 4 E (6). 

o 6 o 

Because of the well-known concave nature of E (\), 


' E o (X) ^ E o (X) 
since E (0) = 0. Hence 


(55) 


7) < 6 . 


(56) 


Thus for some channels at least [certainly for all channels 
satisfying (40), such as the BSC] 


^ [ E o<™ + f 3 (l7* • It)] > - ^ f 1 (l^) < 57 > 

so that there is no value of G that would simultaneously optimize the 
bounds on U(u) and P^(t). 

We already remarked in the preceding section that for non- symmetrical 


channels a different input distribution r(x) optimizes the different 
performance parameters. As a consequence, the algorithms of the present 
section will not be optimal for such channels. 
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A. Bounds on U(u) when (39) riot satisfied 

Let 6 R satisfy R = -jp- E^(6 R ) where R >E q ( 1), otherwise 6 R A 1. 

R 

Because of (57), the more interesting case of violation of (39) is that 


I(X; Y) > G > - 


We will now minimize the bound (19) for this case. (It is shown in 
Appendix in. Theorem IH-1 that -f'(0) = I(X;Y). Therefore, unless the 
lefthand side of (59) holds, neither (17a) nor (18a) can be satisfied.) 


Lemma 1 

The exponent of the upper bound on U(u) is minimized by some 
6 e(6*, 6 ) where 6* is the unique solution of 




1 + 6 * 


Proof 


Because of the convex nature of f^(\) and inequality (58), 

v * 6* S R 

p = 1+6* < l+6„ 


so that 6* < 6-p. as asserted. Because of (59), inequalities (17a) and (18a) 
R 

can be satisfied for a fixed 6 only if a e(0, ). Since for 6 e(8*, 6 R ) 

P* <717 (l 


then because of the concave nature of -f^a, 6) as a function of a, the former 

o* n p * 

is maximized over cr e(0, ^ j by the value <j = — . Therefore, from (17), 

o o 

(18), and (19) the maximum achievable exponent cannot exceed - f_ [*—; 6). 

Cs 0 
J, 

But for 6 < 6*, -f_(-~- , 6) < -f_ (~r , 6*) [see Appendix III, Theorem III- 3], 
£ o c, 0 



and exponent » 6*) is achievable since the assignment a = ^ , 

6=5* satisfies (17a) because of (59), (17c) because of (35), and (17b) 
because by the concavity of E q (6), 

< 62 > 

Therefore only 6 > 6* need be considered. 

Adding (18b) and (18c) results in (17b), and for 6 6(6^, 1) 

R 

- g f 2 (cr, 6) < - - f 2 (j^r , 6 ) < - 5— f 2 (TTT' 6 r) = R 

R R 

so that neither (17) nor (18) can be satisfied. We thus conclude that 

5 < 6 r - q.e.d. 


Let us now pick 6 e(6*, 
the exponent. If 



6r) and try to find the value of a maximizing 


(63) 


does not hold then that value of 5 is inadmissible since neither (17b) nor 
(18b) and (18c) can be satisfied for any o' in the allowed range (0,-^r) 

[see the proof of the preceding Lemma], Assume therefore that (63) does 
hold, and suppose that 



(64) 


In this case the choice o = satisfies (17a) and (17c). Since (17b) is 
also satisfied and any smaller value of o decreases -f_(o, 6), the exponent 

w 

is equal to , 5). 

Next, suppose that (64) does not hold and let Oj be the largest value 
in (0, p/6*) such that 



If ( 17 b) holds with <7 = oy then the largest conceivable exponent 
obtainable from bound (19a) is -f 2 (cy 6 ) wilich is at most as large as the 
exponent from (19b) obtainable for some a e(cXj, p*/6). 

Thus if ( 64 ) does not hold, we need consider only the bound ( 19 b). 
Let Oq( 6) be the unique value satisfying 
6G = L- (a, 6) 

that exists provided G < f^ (®, 6) [see Appendix III, Theorem HI- 4 ], If 
(66) cannot be satisfied, we set (6) = 00. Suppose 

Then with a = , ( 18 a) and ( 18 b) are satisfied and if 

R <! Cp* G -f 3 (p*/6, 6)] 

then p* G -f,(p*/6» 6) is the best obtainable exponent for that value of 6. 
If (68) is not satisfied, 6 is not admissible. If ( 67 ) does not hold, and 
CT 1 — ct g ( 6) ’ 111611 41x6 best attainable exponent is -^(ay 6) provided ( 18 c) 
holds, while if Oq(S) e(oy p*/6) then the best exponent is Oq(S) 6G 
- f 3 (cr G ( 6 ), 6), provided ( 18 c) holds. If ( 18 c) does not hold, 6 is not 
admissible. 

We can now state an algorithm that will obtain the best exponent for 
the upper bound on U(u) for a fixed R and G satisfying ( 59 ). 

1 . Find the interval (6*, 6 ) and p*. 

2 . Pick 6 e(6*, 6^) and check if ( 63 ) holds. 

If it does not, 5 is not admissible. Otherwise continue. 
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3. If (64) holds, let 

E u (6)= -f 2 (p^/6, 6) 

4. If (64) does not hold, check if 

6G -f 3 ' ( p*/ 6, 6) > 0 (69) 

If (69) holds and (18c) is not satisfied with a = p*/6 then 6 is not 
admissible. Otherwise 

E u (6) = p* G-f 3 (p*/6, 6) 

5. If (69) does not hold, find the largest Cj e(0, p*/6) satisfying 

(65) and check whether ~ 

6G -f 3 ' (ctj, 6) < 0 (70) 

If (70) holds and (18c) is not satisfied with <j = Oj then 6 is not 
admissible. Otherwise, 

E u (6 > = - f 2^1> 6 > 

6. If (70) does not hold, find cTq( 6) satisfying (66) [necessarily 
cr G (6) e(cfj, p*/ 6)]. If (18c) does not hold with a = ct g ( 6) then 6 is not 
admissible, otherwise 

E u (5) = a G (8)6G - f 3 (a G (6), 5) 

7. Repeat from step 2 on, so as to obtain a plot of E (6) for all 

u 

admissible values 6 6(5*, 6^). The maximum of this plot is the 
desired exponent. 


We expect that (6*, 6^) will contain only one sub-interval of admissible 

values of 5, and that over that sub -interval E (6) will be unimodal. 

u 

Let us next consider the case 


1 +6r 


[ f 3 (l+S- ’ 5 r) ""^2 (l+6_ * 6 r)] * 
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Lemma 2 

The exponent of the upper bound on U(u) is minimized by some 

6 e(6., 6-.) where 6. is either the largest 6 e(0, 6 ) such that 
IK 1 K 

1+6 ” G = f 3 (l+Sj ' 5 l) “ f 2 (l+Sj' 6 l) (?2) 

or is 0 if (72) cannot be satisfied. 

Proof 

Let ay be the "best" value of a for some 6 e(0, 1). If 6 is admissible, 
then - 

R s-*V°r*) < -K(m’ «• <”> 


But the righhand side of (73) is a decreasing function of 6, so if 6 < 1, 

R 

the lefthand inequality in (73 ) can hold only if 6 < 6_ . 

JK. 

Next, let 6j be as defined in the Lemma. Because (35) holds, then 


for 6=6. and a = YTT 
I 1x0 . 


, all the conditions (17) are satisfied so that the 


* 2 

exponent for this value of R and G is at least - f ^ — , 6^. Because 

, 5^ is an increasing function of 6, then for all 5 < 6^ the 
exponent is smaller than -f^ ' ^l) so on ^ 5 > 5^ need be considered. 

Q.E.D. 


It follows from the definition of 6^, and from (71) that for all 
6 e (6 j, 6 r ), 

T+6 G<f 3(l+6' ^ ) " f 2 (t+ 6 ' 6 )— ” f 1 (l+s) * (74) 

Let @2 be the largest value of a e^O, yy-g J such that 
a6G < f 3 (a, 6) -f^a, 5 ) 


(18b) 



holds with equality and let o, be the smallest value of „ », for 

which ( 18 b) holds with equality. From ( 74 ) it follows that ( 18 b) holds 

for all o «(oj. oj) and that ^ < p*/6. Therefore ( 18 a) and ( 18 b) both 

hold for a e(a_, a ) where 
a & 3 

a 3 = min {cj, p*/6} . 

Let a G (6) be as defined in (66). If <^(6) ^ o^) and 
6R <ct g (6) 6G -f ( ff (6), 6) 

then the righthand side of (76) is the exponent. If ^(j, < ^ and 

6R<a 2 6G-f 3 ( CT2 , 6) 

then the righthand side of (77) is the exponent. 

If Oq (6) > <73 and 

6R < 03 6G -f 3 («j 3 , 6) 

then the righthand side of (77a) is the exponent. If neither of the three 

cases holds. 6 is inadmissible. We therefore get the following 
algorithm. 

1 . Find the interval (6j, 5 2 ) and p*. 

2 . Pick 6 e(6 r 6 r ) and compute a y oy 

3 . If 6G -f 3 ' (a 3 , 5 ) > 0 

check whether (77a, holds. If it does not. 6 is inadmissible, if it 
does, the exponent is 

E u < 6) = c 3 6G -f 3 (o 3 , 6) 

4. If (78, does not hold and 
60 " f 3 (° 2 » 5 ) < 0 

check wheiher (77a, holds. If it does not, 6 is inadmissible, if it 
does, the 


( 75 ) 


(76) 


(77) 


(77a) 


(78) 


( 79 ) 


exponent is 
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E u (6) = a 2 6G- f 3 (a 3 , 6J 

5. If neither (78) nor (79) hold, determine o- ( 6) satisfying 
(66). If (76) does not hold, 6 is inadmissible, if it does hold then 
the exponent is 

E u (6) = <t g ( 5) 6 G -f 3 (a G (6), 6) 

6. Repeat from step 2 on so as to obtain a plot of E (6) for 
all admissible values 6 e(6j, 6.^). The maximum is the desired 
exponent. 


B. Bound on P j (t) when (46) not satisfied 

We will now see how to optimize bound (20) for a fixed G less than 
the righthand side of (46). 

Lemma 3 

If when T] satisfies (44), the inequality (46) does not hold, then the 
value of 6 optimizing the bound (20) on P f (t) is within the interval 

( 6 m > 6 M^ where 6 m ^ 5 M^ is 1:116 lar g est 6 < T| (smallest 6 > T)) such 
that 

1+6 G “ f 3 (l+6 ' 6 ) +f 2 (l+5 ' 6 ) = 0 < 77 ) 


or is equal to 0 (equal to 1) whichever is larger (smaller). 
Proof 

First note that only 6 e[0, 1] are admissible by the bound. If 


6 e[0, 6^) ^6 6^., lj) then because of the concave nature of 

'* 2 ( 1 + 5 ’ 5 ) - 6R the largest value of the exponent cannot exceed 


"*2 (l+ 5 ’ 6 m) " 6 m R 

m 




1 + 6 


M 


• 6 m 


) - 6 m r ) 


(78) 



otherwise the optimal exponent for any G would not be achieved at T|. 

However, because ot (77) the values (78) are achievable with the given 

G and so the optimizing 5 Q.E.D. 

Consider now 6 e 6 , 6,, fixed. We will see how to find the value 
L m MJ 

of a > 0 that optimizes the exponent in (20). We will use the fact that 
-f (cr, 6) and -f (a, 6) are both concave functions of a that are positive 

Lt J 

for some interval (0, cr^) [see Appendix HI, Theorems HI-2 and HI-4], 

and that -f_(cr, 6) is maximized at a = T77 • For « e(6 , 6.,), the left- 
c l+o m JVL 

hand side of (77) is negative. If cr^-. (6) maximizes a 6 G -f 3 (cr, 6)> then 
there are two cases. If 

<t g ( 6) 6 G -f 3 (o G (6), 6) < -f 2 (a G (6), 6) (79) 


then because of the concave nature of -f 2 and -f 3 , the best exponent E f (6) 
is given by 

E f (6) = a G (6) 5 G -f 3 (a G (6), 6) - 6 R . (80) 

On the other hand, if (79) does not hold, and o-,(6) < T7T > then there 

o l+o 

exists a unique e (ct g ( 6), ^ such that 

o 4 6G -f 3 (o 4 , 6) = -f 2 (a 4 , 6) (81) 


and the best exponent is 

E f (5) = -f 2 (o 4 , 6) - 6R . (82) 

Of course, if a Q (6) > -yyg , then cr 4 , 0^(6)) satisfying (81) is 

desired. 

In finding the best exponent for the upper bound on P^(t) when (46) is 


not satisfied, one proceeds as follows: 



1 . 


Find the interval (6 , 6 . ,) 

m M 

2. Pick 6 e (5 m> 6 ^) and find 0 ^,( 6 ) satisfying ( 66 ). 

3. If (79) is satisfied, E^( 6 ) is given by (80). Go to step 5. 

4. If (79) is not satisfied, find o^. E^(5) is given by (82). 

5. Repeat from step 2 on so as obtain a plot of E^( 6 ) vs. 6 . 
The maximum of this plot is the desired exponent. 


C. Pareto Exponent for Arbitrary G 

Let R > E (1) and let 6 „ be as defined previously. We will first 
o R 

find the lower bound on the Pareto exponent when 

1+6 r / 6 R \ 

I(X;Y) > G > - — f,^). (58) 

R R 

We wish to find the largest possible value of y such that (24) can be 
satisfied for some y > 0 . 

Lemma 4 

If (59) is satisfied then the best lower bound on the Pareto exponent 

y falls within the interval ( 6 *, ) where 6 * satisfies (59). 

R 


Proof 

Since G is not chosen optimally, 6 < 5^. . Since 6 * < 6 _, (see Lemma 

R R 

1) we need only to show that y = 6 * satisfies (24) for a = p*/ 6 *, where p* 
was defined in (60). But that choice satisfies (59) and therefore (24a). 
Furthermore, by concavity of E q ( 6 ), 

R + lK (£• 6 *)= R + ^ E o (6 * )<R + ^ E o ( 6 R» = 0 
so that (24b) is satisfied as well. Finally, from (59) and (48), 
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5 * R + f 3 ($’ 6 *)- >* G • 5 * R + f 3 (TT6? ' 6 *) + f i(l^ ) 

^ 6 * R + f 2(m?> 6 *) <0 

so (24c) is satisfied as well. Q.E.D. 

Let v e(6*. 6 R ). (24a) can be satisfied only with a < p*/y. Also 

since 6* < y, then 

. _ 6* v 

p “ 1+6* < 1 +y 

so that -f (cr, y) is maximized over (0, p*/y] by c = . Thus, if 

l Y 

R> Y f a('?’ Y ) <«> 

then the Pareto exponent is less than y. If (83) does not hold and 

Oq( 6) > p*/y then the Pareto exponent is less than y if (24c) is not satisfied 

with of = p*/y, and it exceeds y otherwise. If cr (5) < p*/y, let CT be the 

Cr 5 

unique value of CT e(o G (6), p*/y) such that 

f 2 (o 5’ = f 3 (a 5’Y) - <t 5 yG . (84) 

The Pareto exponent then exceeds y if (24b) holds with a = a and is less 

5 

than y otherwise. 

If (59) holds, the best lower bound on the Pareto exponent is found by 
the following method: 

1. Find (5*, 6 R ) and p*. Let & 1 = 6*, a 2 = 5 R . 

2. If a^ - a j < e exponent is at least a . Stop. Otherwise pick 
V e(a 1 ,a 2 ). 

^ > ^ ^2 V)» set a 2 = y and go to step 2. Otherwise 


continue . 
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3. If yC -f^ <p*/Y, y) < 0 

go to step 4. Otherwise if (24c) is satisfied with a = p*/y set 
= y. If (24c) is not satisfied, set a^ = y. Go to step 2. 

4. Find cr satisfying (84). If (24b) holds with a = cr c , set 

5 5 

a^ = y. Otherwise set a^ = V* Go to step 2. 

We will conclude this section by treating the case 

1+6 R r / 1 \ 1 

0 <G< [ £j ( 6r ) +Eo(5r) ] . 

XV XV 


(71) 


Lemma 5 

If (71) is satisfied, then the best lower bound on the Pareto 
exponent y falls within the interval (6., 6_ ) where 5 is either the 

1 XV 1 

largest 6 e(0, 6 R ) for which (72) holds, or is 0 if (72) cannot be 
satisfied. 

Proof 

We omit the proof which is similar to that of Lemma 2. Q.E.D. 
Let y e(6j, 5.^). Then (24a) is satisfied for all a < p*/y» Further- 
more, since 



G <f„ 






< " £ i(t+y ) 


then 


Y 



(85) 


For the sake of brevity, we shall immediately describe the algorithm that 
obtains the best lower bound on the Pareto exponent. 
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1. Find (6 j, 6 R ) and p*. Let = 6j, ^ = 6 R . 

2. Ka 2 - ai < e, the exponent is at least a^ Stop. Otherwise, 
pick v e(aj, a 2 ). If 

R < T+^ ° " f 3 (l+Y* Y ) 

set a^ = y and go to step 2. Otherwise, continue. 

3. If p* G -f 3 (p*/ Y , V) < -f 2 (p*/Y, Y) (86) 


go to step 4. 


Otherwise there is a unique e| 


(it? "* /y ) 


such that 

Y<T 6 G "V°6’ Y * = " f 2*°6' Y * ( 87 ) 

If (24b) is satisfied with ct = a^, set a^ = If it is not satisfied, 
set a 2 = y. Go to step 2. 

4. If yG -f 3 ' ( p*/ Y» Y) < 0 

go to step 5. Otherwise, if (24c) is satisfied with <j = p*/y, se t 
aj = Y* If it is not satisfied, set a 2 = y. Go to step 2. 

5. Find cTq(y) satisfying (66). If 

ct g (y)yG -f 3 (a G (Y)» Y) > " f 2 ^ CT G^ Y ^ Y ^ 
then go to step 6. Otherwise, if (24c) is satisfied with cr = ct„(y) set 
a^ = y. If it is not satisfied, set a 2 = y. Go to step 2. 

6. If ct g (y) > [ctq(y) < Tf^ ^ there is a unique 

% e (l+? a G (Y) ) [°6 e (. a G M ’ TT^)] 

for which (87) holds. If (24b) is satisfied with a = o, , set a = y. If 

o 1 

it is not satisfied, set a 2 = y. Go to step 2. 
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6. Optimal Expurgated Bounds 

In this section we will develop expurgated upper bounds to the 

probabilities of undetected error and of failure. We will use the notation 

of Chapter 10 of Jelinek [1]. We will limit our attention to convolutional 

codes and channels symmetrical from the input, so that for any given 

code the probability that any information sequence be incorrectly decoded 

is the same for all sequences. We will therefore always assume that the 

all-zero sequence was transmitted. 

If ^is received, an undetected error will take place at depth 0 only if 

L(s^ - L m >0 for some s £G* +U , t > 0, m > 0. (88) 

Hence if an undetected error takes place, then 
00 00 


ZEE 2 °^” *" L ■* >1 ( 

t=0 m=0 seg* +U 

for all ry > 0. Let be the undetected error indicator function for a 
fixed convolutional code C of constraint length v and a received sequence 
jTf Then the probability of undetected error at depth 0 is given by 


p c < e > = 


(90) 


and the probability P( C: P^(e) > B} of selecting a code from the ensemble 

V 

whose undetected error probability exceeds some number B is bounded by 

P{C:P^(e) >B} <B" l/p E c [E^0(jr)] l/p (91) 


where E^ denotes averaging over the ensemble. Thus the probability is 
at most l/2 that a code will be selected whose probability of undetected 
error exceeds 
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00 CO 


1/p 


■-■tell I L ^l] } 

^ x. « _ ~ t+u 


(92) 


t=0 m=0 seG 

o 

where we took into account the fact that the lefthand side of (89) exceeds 
0(^. Let u q stand for the all-zero sequence. Then we can re-write 
(92) as ( 

A 

E 


- ?P 


B = 2 




^G(m-t-u) 

n 2p 

t m 


• X . I w( x/s o ) 


t+u 

SeG - x. 


-Gc t+u /- t+u ( S j)-^ 1 


1/p 


.TCI, t+U. 

, w( X / %o ) w( £ } 


o 


If p > 1, then Jensen's inequality yields 
J -G(m-t-u) 

B 


t m 


i/p . 


•I 


sec 
~ u o 


t+u 


£q 


v 


2 w( x/a 0 > 

x, 


, t+U/ t+u, .. , , 

W <X 


m, t+u. 

W <X / Uq ) w(^ ) 


) (93) 


We now define the exponent functions 
gjCo) = log £ w(y/0) 1 " <J w(y) a 


g 2 (o, p) = p log ^ ^ ^ w(y/0) CT w(y/x) 1_cr ^ 

x y 

g 3 (<7, p) = p log i J ( Y w(y/0) ] 



(94) 

1/p 

1 

(95) 

1/P 

i 

(96) 


x y 

with whose help we can bound the expectation in (93). Denoting the latter 
by F(m, t), we get 
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F(m, t) < J 


exp 2 [mg 2 (a, p) + (t+u-m) g 3 (a, p)] if m < t + u 
exp 2 ”C( t +u) g 2 (a, p) + (m-t-u) g j (ct)] if m < t + u 


(97) 


After some algebra that is identical to that used to derive (19) we 
finally get the bound 

I K 2 ug 2 (a ' p) where K is finite if (99) holds 






o<j + gj(ar) < 0 

pR + g 2 (a, p) < 0 

g 3 (cr, p) - g 2 (cr, p) - oG < 0 


where K is finite if (100) holds 


(98) 


(99) 


oG + gj(a) < 0 


oG + g 2 (a, p) - g 3 (a, p) < 0 


( 100 ) 


g 3 (a, p) + pR - oG < 0 

In the bound (98) the restrictions a > 0, p > 1 are assumed. Comparing 
(98) through (100) with (17) through (19) we see that both bounds have the 
same formal structure. We will take advantage of this when optimizing 
the expurgated bound. 

We show in Lemma IV- 1 of Appendix IV that for channels symmetric 
from the input, g 2 (tf, p) is minimized by the choice a = l/2. Since at least 
half of the codes in the ensemble have a probability of error that does not 
exceed B, we may conclude that 
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Theorem 5 

For R e[0, - ( 1/2, 1) = E Q (l)]and channels symmetric from the 
input there exist convolutional codes whose probability of undetected 
error is bounded by 

pH 

P u (e)<K2 (I01) 

where p > 1 satisfies 

R = - “ g 2^/ 2 » P) (102) 

and K is finite provided G is chosen so that 

2[g 3 (l/2, p ) - g 2 (l/2, p)] <G < -2 gl (l/2) (103) 

Of course, it is necessary to show that the righthand side of (103) 
exceeds the lefthand side, which we do for equidistant channels in Theorem 
IV- 1 of Appendix IV. It is interesting to point out that the expurgated 
exponent of Theorem 5 is the same as that obtained by Viterbi and 
Odenwalder [10] for maximum likelihood decoding of convolutional codes. 

We next turn to the probability of failure. If £ is received, a failure 
of order t will take place at depth 0 only if 

L(s) - L m > 0 for some s eV t and 0 < m < t . 

^ r>o q — — 

Hence if a failure takes place then 

| y 2 a[L,g, - L“] ^ (1M) 

m=0 sen 
~ o 

for all o > 0. Letting cp(jr^ he failure indicator function for a fixed 
convolutional code C , and denoting the failure probability by P^(e), 
can conclude that (c.f. (91)) over the ensemble. 


we 
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P{CiP*(e)>D}<D- l/ <’E c [E x 

Hence the probability is at most l/2 that a code (of constraint length 
u > t) will be selected whose probability of t-order failure exceeds 

1X1=0 V Q 

The same algebra that led from (92) to (93) leads from (106) to 

t rn^ G 
D 5 2‘>-“' G {£ 2 * . 


■I 5c 

s e V t 


L— U 


„ i/p^ 

r-» . 



2,' v <»a 0 > 

- w(x m /a 0 > - 

I 

! 


Using the functions g.(cr, p), the righthand side of (107) can be evaluated 

so as to yield the bound 

p-t[oG-pR-g_(o, p)] 

D <2 


m 


v 

•G 


[oG + g-(a, p) - g,(a, p)] 


m=0 


) 


It follows directly that ifforo>0, p>l 
crG + g 2 (tf> p) - g 3 (cr, p) > 0 

then 

D < K 2 
and otherwise 

t[pR - oG + g 3 (o, p)] 


t[pR + g 2 (cr, p)] 


(105) 


(106) 


(107) 


(108) 


(109) 


(110a) 


D < K 2 


(110b) 
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Again we see that the obtained bounds (110) have the same formal 
structure as the bounds (20) had. Since, as clearly remarked, g-(< 7 , p) 

w 

is minimized by a = l/2, and g 2 (l/2, p) is convex in p, we can conclude 
with [g^ (1/2, p) denotes ~ g 2 ( l/2, p )] 

Theorem 6 

For R e[0, -g 2 (1/2, 1)] and channels symmetric from the input there 
exist convolutional codes whose probability of failure of order t is 
bounded by 

tCfiR + g (1/2, n)] 

P f (e)< K 2 (1H) 

where p > 1 is the unique solution of 

R = " (1 / 2 ' ^ ( 112 ) 
and K is finite provided 

G > 2 Cg 3 ( 1/2, p) -g 2 (l/2, p)] (H3) 

For R e(-g 2 ' (l/2, 1), -g^(l/2, 1) = E 0 (1) ], bound (111) holds with p=l 
provided G satisfies (113) with p = 1. 

It should be noted that the exponent of the bound (111) is identical to 
the expurgated exponent obtained previously for block codes (see Jelinek 
[1], p. 217). 

It is further interesting to note that since 
g 3 (l/2, p) -g 2 (l/2, p) < - g]L ( 1/2) 

then the choice 

G = - 2 g 1 (l/2) = -fj (1/2 ) = -2 log£ y w(y) w(y/0) (114) 

Y 



optimizes simultaneously both the undetected error and failure bounds for 


all R e[0, -g 2 (l/2, 1)]. 



7. Expurgated Bounds for Arbitrary Values of G 


in this section we describe algorithms that optimize the bounds (98) 
and (110) for arbitrary values of G. This we do in spite of the last 
assertion of the previous section, because in the range of rates of interest 
the G-value maximizing the Pareto exponent differs from (114). More- 
over, the rate points below which optimal expurgated exponents exceed the 
corresponding random coding exponents for probabilities of undetected 
error and failure, respectively, are also in general different, so that, 
e.g., the random coding failure and the expurgated undetected error 
exponents might apply simultaneously for some rate interval [this is 
shown in Section 8 ] . 

Since the bounds (98) and (110) are formally identical to the bounds 
(19) and (2 0), the optimization problem ahead of us is almost identical to 
that of Section 5. We will therefore simply state the exponent optimization 
algorithms without providing a detailed justification. 

Let p be the solution of 

* - - p> ui5) 

and let us attempt to optimize the undetected error bound when 

I(X;Y) >G >- 2 g x ( 1/2 ) (116) 

The upper bound in (116) is due to the fact that -g^(a)'is a concave function 
with -g^(0) = 0 and that 

-g'l (0) = ^w(y/0) log = I(X;Y) (117) 

y 

Let CTq (< l/2 ) be the solution of 

G = - “ g,(cr) 

(T l 


( 118 ) 
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Then clearly both (99) and (100) can only be satisfied by a e[0, Oq], Since 

by Lemma IV-3,- — gAo, p) is a decreasing function of p, and is a concave 

P 2 

function of a with a maximum at a - l/2, then in the range <j e(0, ], 

the inequality 

R < - -g-(cr, p) (119) 

— p t 

can be satisfied only for some p < p . In (119) let a = and let p GR be 
the value of p that satisfies (119) with equality. If p GR < 1 then for that 
R-G combination an expurgated bound cannot be developed. Otherwise, we 
know that in any case we must choose p e(l, p GR ) an ^ cr e(0, <y^) to satisfy 
either (99) or (100). The algorithm to find the best exponent for the case 
(116) is as follows: 

1. Find cr G satisfying (118), and p GR satisfying (119) with a = ct g - 

If p < 1, the exponent E eX * > = 0, and stop. 

GR u 

2. If p GR > 1, see whether with p = p GR 

" g l ^ CT G^- g 3^ CT G’ “ g 2*°G' p * ( 12 °) 


If so then (99) are satisfied and the exponent is 

E u P = “ g 2 (a G’ P GR* 

Stop. 


3. If (120) does not hold, neither (99) nor (110) hold with 


a = cr„, p = p„ . Select p e(0, p ) and see whether (120) holds. 
G uK GR 

If so the best exponent for that value of p is 
E® Xp (p) = - g 2 (a G , p) 

4. If (120) does not hold for the chosen value of p, check if 


a 


G ~ ba g 3 (a ’ p) 


> 0 


a = a. 


If (120) holds and (100c) is not satisfied with a = a , then p 


( 121 ) 



is not admissible. Otherwise 


E “ P (P) - c G G - g 3 (a c . P) 

5. If (121) does not hold, find the largest cr^ e(0, <j ) satisfying 

CTj G + g 2 (ffj, p) -g 3 (crj, p) = 0 


and check whether 


G " g 3 (c7 ' p) 


< 0 


a = o, 


( 122 ) 


If (122) holds and (100c) is not satisfied with c = <jj, then p is not 
admissible. Otherwise 
E® XP (P) = -g 2 (c r p) 


6. If (122) does not hold, find a satisfying 

G g 3 (a> p) 

If (100c) does not hold with c = a then p is not admissible. Other- 
wise 

C P = °z G " g 3^ CT 2' 

7. Repeat from step 3 so as to obtain a plot of E exp (p) for all 

admissible values p e(l, Pq R )« The maximum of this plot is the 

desired exponent E 6Xp . 

u 


We wish next to find the exponent for undetected error when 

G <2 [g 3 (|, p R ) - g 2 (|, p R )] (123) 

In this case cr^ > l/2. Since (119) must be satisfied, p can be admissible 
only if p < p R . Let p^ be the largest p e[l, p R ], if it exists, such that 


G = 2[g 3 (l/2, p) - g 2 (l/2, p )] 


(124) 
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Conditions (99) are satisfied by cr = l/2, p = pj and so the exponent is at 
least -g~th, P,). For p < p , the exponent would have to be smaller, and 
so if pj > 1 exists, we need only consider the interval [pj, p R ]. 0“* 
algorithm for finding the best exponent is as follows: 

1. Find p. if it exists. If p. e[l, p^ ] does not exist for which 
(124) holds, let p. = 1. 

2. Select p e[pj, p R ]. Let o^c^) be the largest value of 
a e[0, l/2] (smallest value of a e[l/2, <*>]) for which 

(125) 


(126) 

(127) 


cr G = g,(or, p) -g 7 (o, p) 


and define 


<t 3 = min (o j, a Q ) 


3. If 


G " Bo g 3 (ct ’ 


> 0 


a = ct- 


check if (127) holds with <7 = a-j 


pR < a G -g 3 (cr, p) 


If (127) does not hold, p is inadmissible. If it does hold, then 


’r P(p) = a 3 G - g 3 (a 3’ P) 


4. If (126) does not hold and 


G -a? g 3 (o - p) 


< 0 


CT = <J~ 


(128) 


check if (127) holds with cr = c^. If it does not, p is inadmissible. If 


it holds, then 
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E u P (p) - ct 2 g "83^2' 

5. If neither (126) nor (128) hold, let a 4 be the unique 
value of a satisfying 

G - "• g 3 (a, p) = 0 

If (12 7) holds with a = then 

4 

< XP ( p > = °4 G - 83 (^ 3 , p) 

otherwise p is inadmissible. 

6 . Repeat from step 2 on so as to obtain a plot of E exp (p) for 
all admissible values p e[pj, p ]. The maximum is the desired 
exponent E 6Xp . 


Finally, we wish to find the best expurgated exponent for the 
probability of failure when 

G <2[g 3 (l/2, p^) -g 2 (l/2, jj^)] 

where p^ satisfies 

R = -g* (1/2, p) 

Our search algorithm is as follows 


(129) 


(130) 


1. Find p. m (p M ) the largest p e[l, p^) (the smallest p > p^) 
such that 

G = 2 Cg 3 (l/ 2 , p) -g 2 (l/ 2 , p)] 

If p does not exist, set p = 1. 

1X1 m 

2 . Choose p and find cr^ satisfying 

G ■ t g 3 (o ' = ° 
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If 

a i G + g 2* c r ^ " g 3* c r ^ - 0 

then 

E^* (^) = - Cm- r - a 1 G + g 3 (aj, p)] 

3. If (131) does not hold, and Oj < l/2 [<y > l/2 ] find unique 

cr 2 e(oj, l/2) [cr 2 e(l/2, o^)] such that 

CT 2 G = g 3^°2’ ^ -g 2^° r 2' ^ 

Then - 

E® X (n) = - CfiR + g 2 (a 2 , ijl)] 

cx 

4. Repeat from step 2 on so as to obtain a plot of E^ (p) for 

CX 

u e(p , p, The maximum is the desired exponent E, . 
m M f 


(131) 


8. Performance Curves for Gaussian Channels with Binary Inputs 


In this section we first apply our exponent optimization procedures 

to quantized Guassian additive noise channels with binary inputs. 

Figure 4 concerns binary output quantization applied to a channel whose 

SNR is 1.5 dB per transmitted bit (this channel has R = .485). 

comp 

In Figure 5 the quantization is optimal uniform octal and its SNR is 

-.3 dB per transmitted bit (here R = .51). Finally, in Figure 6 

comp 

the quantization is again octal, but the SNR = -2.0 dB (R = .375). 

comp 

Each of the figures contains curves of the failure and undetected 

error exponents as a function of the rate R. There are three curves 

of each type: the first curve corresponds to the usual choice G=R. 

The second curve corresponds to the choice 
1+0 r l g \ 


G= 


f 

a 1 V 1-f 


1+cr / 


(132a) 


for 


E (1) < R = -E (o) < C 

O — CTO — 


(132b) 


and 


G= -2 f x (1/2) = -2 g x (1/2) 


(133a) 


for 


0 < R < E (1), 


(133b) 


which is the largest possible G optimizing the undetected error exponent. 
The third curve corresponds to the choice 


G= ^ Te (I) ) + f 
T| o 3 


(si ’ ’O] 


(134a) 
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for 

E* (1) < R = E 1 (Tl) < C, (134b) 

o — o — 

G= 2 ^EJl) + f 

for 

-% z d/2, 1) < R < E^(l), (135b) 

and 

G= 2^ (1/2 ,p) - g 2 (1/2, 

for 

0 < R = -g' 2 (1/2, n ) < “§2 M 2 ’ ’ (136b) 

which is the smallest G value possible that optimizes the failure 
exponent. The three figures show the performance degradations incurred 
incurred by a non -optimal bias assignment. Interesting is especially 
the substantial failure exponent degradation that results from the 
customary assignment G=R. The corresponding weakening of the 
undetected error exponents at low rates should also be noted. It is 
hard to say whether this phenomenon is real or simply reflects the 
inadequacy of the bounds. 

Figure 7, the last presented in this paper, gives the Pareto 
exponents for the three kinds of channels (see above) when G is 
selected so as to optimize the Pareto or failure exponents, respectively. 


] 


(136a) 


, d/2 


■ 1> ] “ 


g 3 (1/2, 1) - g 2 (1/2 


,D 


(135a) 
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Derivation of the Fundamental Bound 


In this appendix we prove the validity of the bound (9). Let the set 8 
denote either £>* or ( f (see definitions preceding (3) in Section 2) and let 
s* be the path taken by the encoder. Then assuming 6 e(0, 1], 



vectors ^ x” 1 given a fixed and x* (which due to the code ensemble 

structure is independent of £). Let | | B I l t denote the number of 
sequences s^of length t in the set . Then if = t and i(G t ) = t-u. 


llBll t <2* (8)nR 

where R is the rate of the code. We now have two cases. 


d-2) 
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Case I: m > t 

The righthand side of (I- 1) is equal to (yr^ 1 denotes the sequence 







6nA(8) R + (m-t)f j(a6) + t f 2 (a, 6) 


(1-3) 
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Relations (1-3) and (1-4) substantiate the top and bottom bounds of (9), 
respectively. 
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Appendix II 

Relations Between Functions f. , d'j 


Theorem II- 1 


v) + E o (v)] s o M <-■*“*! (ijf) 


i+x r* f-i 

Y 


(II- 1) 


with equality on either side if and only if 

1 

I! riffi ) r(x) = const for aU y 


(n-2) 


Proof 


Using Holder's inequality. 


“Pz 'l 1 f 1 (t^ ) * ( I W(Y) I 


X L 


w (y/ x ? 

w(y) 


1 1+y 

1+Y \ Y 

r(x) 


< / > w(y) ( 
V 


l 


w 


LtM 


w(y) 


1 

1+Y 


r(x) 


1+Y \ Y 


1 - , 1 . 

= “Pz Y f 2 < T+Y’ 


with equality if and only if (H-2) holds. This establishes the righthand 


side of (H- 1). Similarly, 
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exp 


f 3(l^' Y ) = 


w(y/x) 

w(y) 


1+Y 


(x) ) 


X L 



= ***2 f 2 (iT? y) 

with equality if and only if (H-2) holds. As a consequence 


(II-3) 


^ [ f 3(l^' v)+E o (Y)]< i ^ [-1 ^E o ( Y ) + E o ( y )] = Ie o ( y) 

so that the lefthand inequality of (II- 1) holds as well. Q. E. D. 

Since E^(y) - - ’ "Y ) re l a tion (II-l) establishes that for every 

6 e(0, 1), G can be chosen so as to satisfy (35). 
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Appendix in 

Properties of f . (a, 6) Functions 

Theorem III- 1 

The function f j(\) is convex. fj(0) = 0, f^(l) 
only if w(y/x) > 0 whenever r(x) > 0. Finally, f 
information between X and Y. 

Proof 

Let 

X 

y = w(y) 

Then 

f^l-X) = log E y [g (X y ) X ] 

If X = eXj + (1-0) \ 2 with 0 e(0, 1), then 


fl (!-X) = logE y [EX y eX l +(Ue,X 2]< 


r E x x i 

Tex 2 ] 

< 

y J 

L~ y J 



i lo *{S y [S X y M ]} e Kfe ^ 2 ]} 1 9 


= 0 fjd-Xj) + (1-0) f 1 (l-X 2 ) 
(III-3) proves the convexity of f j(X). 


0 with equality if and 
= -I(X; Y), the mutual 


(in-i) 

(in- 2 ) 


(in-3) 



102 


Since EX 

~ y 


1 then f j(0) = 0. Next, 


f (1) = lixn 
\J- 0 


i° g ^ 

y 




with equality if and only if 



whenever r(x) > 0 


Finally, 


ax 


(i-x) 


E E 

<~y 

[*: 

r 1o * 

E 

~y 

E 

r x M 

L yj 



so 

lim f.'(X) = - lim f ' ( 1-X) = -E EX log X = 

Uo 1 x f i 1 -y~ y g y 


= - y, w{y ^ J 

y x 


w(y/x) 

w(y) 


log 


w(y/x) 

w(y) 


-I(X;Y) 


Q. E. D. 


Theorem III -2 

f 2 (a, 6) is convex in a. ® with equality if and only if 

w(y/x) > 0 whenever r(x) > 0. f 2 ( 1, 6) = f^S). Thus for 6 < 1, f 2 ( 1, 6) < 0. 
Proof 

Using (III- 1) 

f 2 (a. 6) = log E y (E X y 0 ) 6 (E X y 1 - a6 ) 


(in-4) 
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Let 0 e(0, 1). Then 

f 2 (0 CTj + (1-0) c 2 > 6 ) = 


0 a. + (1-0)ct, 6 , 0(l-a. 8) + (1-0) (1-a ? 6) 

= log E (e X 1 2 ) (ex 1 ) 

& rsjy\r*t y ) \rv> y / 


cr,.6 . 1-0,6, _0 _ a> , 5 , l-o^S. _l-0 


<l°g Sy[(5 X y ‘) (g X y 1 )] [iS X t ^ (S \ )] 


<0f 2 (c r 8) + (1-0) f 2 (a 2 . 6) 


(in- 5) 


so that f^ (a, 6) is indeed convex. Since E = 1, 


f_(0, 6) = lim log E (E X CT ) 6 < 0 

2 | ° rsjy rsj y ' — 

ff T 0 


with equality if and only if 


lim EX =1 for all y 
. . «'>» y 
ct4> 0 7 

i.e., if and only if w(y/x) > 0 whenever r(x) > 0. 


The fact that f 2 (l, 6 ) = f^S) follows directly from (III-2) and (III-4). 


Q.E.D. 


Theorem III-3 

For p > 0, f _ ( p/ 6, 5) is a convex decreasing function of 5. 

Proof 

We first prove convexity. Let 6 = 08 ^ + (1-0) S 2 where 0 e(0, 1). 


Also, let 



lO'l 


98j (1-9) 6 2 

a,— , 1-Q = — 


(in- 6 ) 


so that 


■f = a 7^ + (1-a) -jr~ 

6 6 1 6 2 


(III- 7) 


Then 


fo \ / Q P/5 1 + ( 1 - Q ) P /6 2 6 1-p 

f 2 (i' 6 ) = log S y (£ x y 2 )(s x y )< 


a p/6 06 a p/6 (1-0)6 

<iogE (ex ) Vex ) 2 ex p )< 

— & My\~ y / y / \(nj y / — 


<0f 2 (p/6 r 6 X ) + (1-0) f 2 (p/6 2 , 6 2 ) 


(in- 8 ) 


which proves convexity. 

Next, after some algebra. 


£ 5 * 2 ( 5 ’ 5 )= [ ex P 2 - f 2 (p/6 - 6) ] • 



where we made use of the log x < x- 1 inequality. Q. E. D. 

Theorem III- 4 

The function f^a, 6) is convex in a. f^(l, 6) = 0 and f^(0, 6) < 0 with 
equality if and only if w(y/x) > 0 whenever r(x) > 0. 



Proof 


Using (in-1) 


f 3 ( 0 , 6 ) = log E y (g X y °y 
Thus if 0 e(0, 1) then 

£ 3 ( 8 o 1 + (1-8) o 2 , 6 ) < log E y (E X *) (E X y “) 


CT, 06 
• % 


a 2 (1-0)6 


<0 f 3 (a r 6) + (1-0) f 3 (a 2 , 6) 

Next, f 3 ( 1 , 6 ) = 0 because E = 1. Finally, 

lim f_ ( a , 6) = lim U(a> 6) < 0 
a I 0 a^O 

with equality if and only if w(y/x) > 0 whenever r(x) > 0. Q.E.D. 


Theorem III- 5 


f e ( 6 ) + f ,( t 7 T , 5 ^ *] is a ncn-negative function of 6 > 0 . 
6 L o 3\l+o / J 


Proof 


Since by Holder's inequality 


1+5 


\l + 6 

rv. y — \r* *yj 


EX " ~ < (e X 


= 1 


then 


f (— , s') = log E (EX 1+5 ) 6 > 

z 3 \i+6 ' J B ~y\~ y > ~ 


> lQ g Sy(g X y 1+6 ) 1+6 " £ 2 (l+6 ’ 5 ) " " E 0 (6) 
Therefore the function is indeed non-negative for all 6 > 0. 


Q.E.D. 



Appendix IV 


Properties of g . (g, 6) Functions 

Lemma IV- 1 

For all p > 0, g^c, P) is minimized by the choice a = 1/2. 

Proof 

Consider any input letter x' / 0. By definition of channels symmetrical 
from the input [see Jelinek [ 1 ], p. 201], there exists a permutation ir of 
outputs y such that 

w(y/0) = w(Tr(y)|x') for all y (IV- 1) 

and a permutation ir* of inputs x such that 

w(y |x) = w(Tr'(y) |tt*(x)) for all x (IV-2) 

Therefore, 

. l/p l/p 

Z (Z w( y/°) CT w (y| x ) CT ) = Z (Z w My) ! x ') a w(TT(y) |tt*(x)) ~ CT ) 

X y x y 


= Z(Z w(y|x') CT w(y|x) 1 “ Cr ) 

x y 

and IV-3 holds for allx'. Thus we can write 

g 2 ( CT » P) = P log Z (Z w ^ y / X '^ CT w(y/x) X CT ) 1/P 
a 

x,x» y 


(IV-3) 


(IV-4) 


It is well known that the righthand side of (IV-4) is minimized by the 
choice a - 1/2 (see Jelinek [ 1 ], p. 246, problem 7.28). Q.E.D. 

Define an equidistant symmetrical channel (c.f. Jelinek [l], p. 230) 
as a channel symmetrical from the input that also satisfies 
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^ V w(y/0) w(y/x) = a for all x / 0 

y 

£ w(y/°) = Y for all x / 0 

y 

Theorem IV- 1 

For equidistant symmetrical channels and all p > 1* 


g 3 d/2, p) + g^l/2) <g 2 (l/2, p) 

Proof _ 

Instead of (IV-6) we will prove that 

GXp -^ [ g 3 * 1//2, p) + g 2 ( x /2 ) ] < exp (1/2, p)J 


Let 


Then $ is an increasing, concave function for p > 1. If we let 

** = [£ -M f ] [ I && ) <3$ f 

y y 


and 




then our task is to prove that 


- 1 * (a i> 


(TV-5) 

(IV-6) 

(IV- 7) 
(IV-8) 

(IV- 9) 
(IV- 10) 

(IV- 11) 


x 


X 



or, utilizing condition (IV- 5), that 


<j>(a ) + (a-1) 4>(a ) <4»(a ') + (a-1) 4»(a') (IV- 12) 

o i — o i 

It follows from a trivial modification of Theorem 108 on p. 89 of Hardy, 

Littlewood, and Pola [11] that (IV- 12) holds if 

a > a . , a 1 > a ' 
o—l o—l 

(IV- 13) 

a > a ' , a + (a- 1) a , < a ' + (a- 1) a' 
o — o o 1 — o 1 

We must therefore prove that (IV- 13) is indeed satisfied. Now, by 

Holder's inequality, 

y 


[i*wwn 


3/2 


2/3 


il/isl 


3/2 


1/3 


■ l «« f 


(IV- 14) 


where the last step is due to the symmetricity conditions (IV- 1) and (IV-2). 


(IV- 14) proves a^ >a^. Next, 


1/2 


1/2 




1/2 


w(y) 


• [i ^>e ^)] i/2 

y 


= a' 
o 


(IV- 15) 
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3/4 


ill o) 


1/4 


4i^>e^) 3/ Y K^>(^n 

y y 

However, a = 1 so that a 2 = a < a ' . Finally, we must substantiate the 
' O o o o 

last inequality in (IV- 13). But because of the symmetricity of the channel, 

X ^2 

VV , . ^( y / 0 ) \ /w(y/xl^ l/2 _V fw (ir ( y ) /it* (x]) 1 

£ w( y) ( w (y) / \ w(y) ) LL\ ) w(Tr(y)) L w(ir(y)) J 


w(y) 

1/2 


1/2 1/2 

7 ' n‘/ 2 


x y 


x y 




1/2 


(IV- 16) 


x y 


where the permutations v and it* are those referred to in the proof of 
Lemma IV- 1. Since (IV- 16) holds for all x', we get 


a + (a- 1) a 
o 


l=l a * = {al 


w(y/x') 
w(y) 


-.1/2 


} 


x' 


x' x y 


V w(y) J I 




(IV- 17) 


On the other hand. 
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a- ♦ (a.l)a' . ^ J £w( y ) (SJlM 


w(y) 


) (=sr) }* 


x y 




1/2 


x-' x y 


1/2 2 


■ ilMKrUPr ] 


(IV- 18) 


Since (IV- 17) has the form^fE Z) 2 and (IV- 18) has the form - E (Z 2 ) the 

a a ^ 

last relation of (IV- 13) holds, and the theorem is proven. 

Q.E.D. 


Lemma IV-2 

For any p > 0, the functions g^o), g 2 (a, p), and g 3 ( CT , p) are convex 
with o. 


Proof 

By Holder's inequality, 

_ . . 9 o . + ( 1 - 0 ) o-> 

gjfSffj + (1-9)^) = l<>g£ w(y/0)[^^y] 

y 

<Iog [Xw(y/0,(^y) ‘] [£w( y /0)[^L] 2 ] = 

y y 


= e g 1 (o 1 )+ (1-0) gl (o 2 ) 
so gj(a) is convex. Similarly, 



Ill 


= e ° ,+(1 ~ 9,CT2 ] 


i/p 


X y 


,9 

C 1-TP 


S»*il ‘] '] 


1-9 
p 


x y 


< 0 S 2 (Oy p) + (1-9) g 2 (^ 2 » p) 

so g 2 (°> p) is convex as well. The convexity of g^for, p) is proven in the 
same way. 

Q.E.D. 

Lemma IV- 3 

For any fixed <j e(0, 1),- g_(a, p) is an increasing function of p > 0. 

P * 

Proof 

~g->(CT, P> 1 r- /v- rt 1--N 1 / P 1 

2 P = - ^ (^ w(y/0) w(y/x) ) A h(-) 

x y , 

But 

-^h(X) = (^w(y/0) CT w(y/x) 1-Cr ) log w(y/0) CT wfy/x) 1 CT ) 

x y y 

and for a e(0, 1) 

^w(y/0) a w(y/x) 1_CT < (^w(y/0)) a ^w(y/x)) ! ° = 1 

y y y 


so that h'(\) < 0. Therefore 


_d 2 (ct ’ p) 

dp 



A >o 


for p > 0. 


Q.E.D. 



112 


Lemma IV-4 


-g 2 ' d/2, p ) 



h(x) l/p 



+ log a 


where 

h(x) = £ Vw(y/0) w(y/x) 
Proof 


Involves simple algebra and is omitted 


Q.E.D. 
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Figure Captions 


Figure 1; 
Figure 2: 
Figure 3; 

Figure 4: 
Figure 5: 
Figure 6: 
Figure 7: 


A trellis for a rate R = 1/2 code. 
Partial tree of a code of rate R = 2/n . 


Graphical maximization of s* leading 
optimal upper bound to Q(a). 


to an asymptotically 


Undetected error (top curves) and failure exponents 
(bottom curves) for a binary output quantized Gaussian 
channel with SNR equal to 1.5 dB per transmitted bit 
When different bias values are used. 

Undetected error (top curves) and failure exponents 
(bottom curves) for an octal output quantized Gaussian 
channel with SNR equal to -0.3 dB per transmitted bit 
when different bias values are used. 

Undetected errpr (top curves) and failure exponents 
(bottom curves) for an octal output quantized Gaussian 

when n p f r S u R e<IUal ‘° " 2 -° dB P er transmitted bit 
when different bia? values are used. 

^an e nciro f °r e ' ltPair f ^ octal quantized 

.hannels of Figures 4, o, and 6 when the bias G 

exponents/ 6 Pare '° <bet,<!r CUrVe) ° r failure (worse c “"e) 
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II -E. Bootstrap Trellis Decoding 
1. Description of the Rudimentary Decoder 

Bootstrap trellis decoding is based on a convolutional code of 
contraint length (in branches) and its truncated version that is 
obtained by eliminating all but the first digits of each 

generator defining the original code. The truncated code has therefore 


^b " 1 

2 trellis states per level. We will assume v, to be so large that 

b 

at the SNR used, the probability of error of the corresponding maximum 
likelihood (Viterbi) decoding would be negligible compared to the 
probability of error resulting from the scheme described below 
(see Section 3). 

The rudimentary binary bootstrap trellis decoding algorithm 
is as follows: 

1) m-1 streams of binary data of length N are encoded using 
the same v^-constraint length code, and an mth stream is created 
using mod 2 position by position addition of the m-1 streams. 

2) The m streams are transmitted through the channel, and 
the receiver creates an appropriate state stream as in Bootstrap 
Sequential Decoding [3], 

3) A p^-truncated trellis decoder is used to decode the first 
stream, its metrics at depth i. 


w (y., z./x.) 
m i ' i 

w (y.» z ) 

m i i' 


- R 


log 


( 1 ) 



being based on m, the number of streams in a block, on the 
transmitted and received digit x.^ and y^and on the state stream digits 
z . The bias R corresponds to the convolutional rate. To each depth 

1 hr 1 . 

i of the N-branches long codeword there correspond 2 likelihoods, 
the maximum of these at depth n being denoted by L^. Let 

T M T 

L = max Li. 
n , _ . . 1 

1< i< n 

so that is a monotone increasing function of ne j 1, . . . , N } (N is the 

n *- J 

stream length in branches). Let 8 be some suitably chosen threshold. 

If - L <0 for all n. the decoder accepts the decoded first stream 
n n 

information sequence, otherwise it rejects it (in fact, it will stop 

M 

decoding whenever a depth n is reached for which L r - L^ ^ 0 ). 

4) If the 1st stream was accepted, it is replaced by the estimated 
transmitted stream, the state stream is accordingly recalculated, and 
the decoder proceeds to decode the 2nd stream as in step 3, using a 
metric table appropriate to m-1 undecoded streams (the subscript m in 
(1) is replaced by m-1). 

5) If the 1st stream was rejected, 2nd stream decoding proceeds 
exactly as in (3) with no change to either metric or state stream. 

6) Steps 3 through 5 establish a pattern that is adhered to in 
general: after every acceptance the state stream and metrics are 
recalculated, and decoding of the "round robin" next stream begins. 


7) Decoding terminates in either of 2 ways ; 

(a) SUCCESS: all m streams get finally accepted. 



(b) FAILURE: when X streams ( l < m ) remain undecoded, 
X successive attempts at stream decoding end with 


rejection. 

2. Bounds on the Probability of Failure 

In this section we will obtain upper and lower bounds on the 
probability of failure or error. Let A^X) and F^(X) denote the events 
that when m-k streams have been correctly decoded, the Xth of k 
remaining streams has been decoded in error and has failed the 
threshold test, respectively. Let A^(X) and F^(X) denote the 
complements of these events when m-k streams have been correctly 
decoded. Then the probability of failure or error is bounded by 


1 

r *1 

m 


m 


P(FUE) < P< 

.IT F m (i| 

U 

u A (i)Fl(i) 

. , m m 

li =1 


• 

1—1 




> + 


m _____ ___ 

+ P{ u A (i)F (i) 

, m m 
i =1 


IT F m-l (j) 

J = 1 


u : 


U a (j)F (j) 

. . m-l m-i 

!*■ 


m m 


+ P U , u , W 

i=l j=l 

1 j^i 


F ro -z <*> 


FF 

L X ^ i 
Z i j 


U 


U A (j)F 

m-£ m -c 

Zt i 

ii j 


+ . . . +P |l) A (i.) F (i ). . . A (i )F (i _) 
' . m 1 m 1 3 m-Z 3 m-Z 

i. 

J 


F_(i )F (i ) 
Z m-l Z m 




where the union with the subscript i. is over all permutations of m-l 
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digits taken from the set {l,2,...,m} . Realizing that every term 
in each union is equally probable, we can upper bound P(FtJE) further by 


m ■ m-1 

P(FUE) < P { TT F (i) } + (“) Pi IT F (i)} + 

. , m i . , m-x 

i=l 1 = 1 


m-2 


+ <”)p{ TT F m _ 2 (i>} +••• + C-2 )P ^ r 2 (1)F 2 (2) ^ + 

i=l 

+ m P (1)F (1)} + (m-1) (™)p{a_ _(1)F_ (1)} + 

mm 1 m-1 m-1 


f 


)}+...+ 2( m _ )p{ A_(1)F_(1) 1 . 


+ (m-2)(^)P\A _(1)F ,(1)X +...+ 2(“ x _)P\A (l)F.(l) J .(2) 

u m-^ m —L* m-Ci c* 


Since not using the state information increases the probability of not 
being able to decode, then 



(3) 


where F (j) denotes the event of failing the threshold test on the jth 
00 

of a block of m = oo streams (in such a case state information is 
worthless). The last equality in (3) follows from the fact that if 
state information is not used, decoding of any set of streams is 
independent and identically distributed. Another valid upper bound is 



( 4 ) 


Collecting the results (2) through (4) we get 
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m-2 _ 

P(FUE)< Y_ <T ) L^ P ^ F " a ^ m "^ P ^ F m-/ 1) ^ +lm '‘ )P ^m-i <1)f m -i 1) J' 


To lower bound P(F), let B be the event that some set of £ < m streams 

£ 

have been correctly decoded and passed the threshold test, and let 

C be the event that after £ streams have been decoded, none of the 
m-4 

remaining m-4 streams can be correctly decoded. Then 

P(FUE) * p{[b • C Ju B } s P {c } . (6) 

4 m-4 £ m-jg 

However, since the probability of decoding at least ope of remaining 
m-4 streams is smaller than or equal to the probability of decoding at 
least one of a given set of m-4 streams that satisfy the parity constraint 
(because the first £ streams to be decoded will in general be the least 


noisy ones), we have 


m-4 


p{c } £ PlTT A ,(3)} • 

m-4 * ' m-£ 


Since certainly 


P^A k (l),A k (2),...,A k (k-l)/A k (k)j * p[A k (l),A k (2),...,A k (k^l)/A k (k)} , 


then 


r \ r ^ 

Pic J z pU „(i)J 


m-4 


m-£ 


f l 

where P\A^ (1)]" denotes the probability that the first of a given set 
of m-4 streams cannot be correctly decoded. E ur thermpre, because of 
the parity constraint, if two streams remain, then either both or neither 
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will be correctly decoded. Hence 

i 

pJo,} *p/A 2 (1) • A 2 (2)j = p[A 2 (1)} . (8) 

We therefore get from (6), (7), and (8) that 

P(FIJE) 2 max { p(a (1)J , max p[aJ1)} }. (9) 

m2k23 


3. Estimates on Exponents 

In this section we use the bounds (5) and (9) to estimate the 
limiting behavior of(l/fj.)log P(F(JE). We get 

p{f^(1)} < pIa^I)} + p{F k (l)A k (l)l . 


( 10 ) 


Now 




N L k (N,^)2 


-M.E k (R) 


(ID 


where p. = p^X is the truncated constraint length in bits (X is the 
number of transmitted digits per branch) L^n,^) is a slowly varing 
function of its parameters whose value does not exceed l,and E k (R) is 
the undetected error exponent that corresponds to maximum likelihood 
decoding of the first of k parity constrained streams (see step (1) of 
Section 1) that utilizes the received as well as state stream digits 
when the convolutional transmission rate is R (the net rate that takes 
into account parity as well as stream tail degradations is 

m-1 N 


m N+v, 


R ) . 
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The probability p{.F^(l)A^(l) } is upper bounded by the probability 
that the likelihood on the correct path ever drops by Q . It has been 
shown in [1] that the bound 

f - \ 

P\F k (l)A k (l) | < K x N 2 

holds where K^e (0, l] . For channels symmetric from the input 
h^ is the solution of 

1 -k 


( 12 ) 


F =^ £ i<-V 


(13) 


where 


(C> = lo g ]T w k (y» z ) [ 


w Jy,z/o) i 1 "^ 


y, z 


w k (y, Z) 


] 


(14) 


)} i, 


Finally, pIa^O^F (1 ) I is the probability that some incorrect path passes 
the threshold test at all depths. It is upper bounded by the probability 
that the likelihoods of all initially incorrect paths exceed - 0 at the 
earliest point at which they rejoin the correct path (all paths are 
joined with correct path at depth N + v). It is then easy to show that 


p{A k (l)F k (l)} < K 2 N 2 


a k 0 -v [a k - fjtt - a k )] 


(15) 


where v = v^X is the constraint length in transmitted digits, and 
K 2 is a finite constant provided 

R < a k R ~ *1* ^ “ a ]J c k ^ ° ’ (16) 

Since f (£) is a convex function of Q , and 
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c | = y x:Y) • 

c -° 5 'C=0 


then relations (13) and ( 1 6 ) can be satisfied simultaneously provided 

w (y,z/0) 

R<i k (x ;Y )= Y. w k (y> z /°> i°s •■ w -- y ; z) • < 17) 

K 

y» z 

Plugging (10), (11), (12), and (15) into (5) we get 

m-2 


P(FUE) < (“) 

£=0 


n r "^ E » (R) -vn) 

min 1 (nLK 3 2 + J / 


m-i 




N[ K_2 


-pE (R) -h 0 

m-i, + 2 m ^ 


]}♦ 


m 


I 

k=2 


m 

(m-k) k K 2 N2 


^e-vi^R-f, tt -*>] 


(18) 


Let 


where 


0 = py 


y = max — E, (R) 
2 < k< m \ k 


(19) 


and note that (m-jj) E (R) and E (R) decrease and increase with Z, 

00 m - Z 

respectively. Also, let j0e(2,m) be the index maximizing 
Ic 

o k e - v [a^R “ f j (1 - ct^)], let a = a , and define 


a - a R - f ^ (1 - a) 


( 20 ) 


Then 
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... -h-P tt ( r ) M-cry - vq 

P(F IJE) < K 4 N k~ 2 + K 5 2 


where 


{ 


* 


(3 U (R) = min \k EJR), E + (R)j 


k -1 


and 


k " = min{k : k E (R) ^ E, (R)} . 

co k 


We see from (21) that 


1 


lim - - log P(F U E) s (3 u (R) 

(i ^ 


provided 


v 2: p[P u (R) + ctY] 

Finally, using (9) and (11) we get that 


P(F (JE) i max 1 N K 4 2 


+ 


-H e 2 (R) 


, max (NK ) 2 

rn 2:.k 2: 3 


k -H- kE k (R) 


}. 


Let k be the integer minimizing k E (R) over k = 3,4,..., m and 

K 


define 


dn {. 


B (R) = min l.E 0 (R), k E ,(R) 


I . 


Then 


P(FUE) 2 K & N 2 


-nP L (R) 


and 


lim - - log P(FUE) < 8 (R) 

LL 

\X a> 


We will summarize our results in the following theorem. 


( 21 ) 


(22) 


(23) 


(24) 


(25) 


(26) 


(27) 


(28) 


( 29 ) 
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Theorem 1 

Let E (R) be the exponent of the probability of undetected error 
k 

corresponding to Viterbi decoding of the first of a block of k received 
streams when the transmitted codewords satisfy the parity constraint. 

Let h^, k = 1, ...» m be the solutions of (13), let a k maximize the 
righthand sides of (16), and let y and a and a be as defined in (19) and 
(20). Let the bootstrap trellis decoder be based on the p -truncated 
prefix of a convolutional code of constraint length v. If the stopping 
threshold has value 9 = fiy then there are codes whose probability 
of failure or error satisfies 

p (R) < - lim - log P(FIJE) <P l (R) (30) 

provided v 2= p [p^R) + ay] . The bounds p^R) and PJR) are given 
by (22) and (27), respectively. 


4. Exponent Evaluation 

The preceding theorem gives bounds on the error exponent for 

Bootstrap Trellis Decoding in terms of the undetected error exponent 

E (R). In this section we show how the bounds can be evaluated. 

It 

First note that the exponent E^(R) is known only for RC( R comp »C)> 

but that upper and lower bounds to it exist for R£(0, R corrl p)* Since what 

is wanted in practice is an estimate of the behavior of P(FSJE), w ® w iU 

take the point of view that for Rf(0, R_ ), E (R) is given by its 

comp k 

expurgated lower bound [2 ]. 



132 


Let w (y, z/x) be the probability that when x is transmitted y 

K 

is received and z is the state digit, when the block of k transmitted 
streams satisfies the parity constraint (see Jelinek and Cocke [3]). 
Assuming a symmetric binary input channel, define the exponent functions 

E°(o) = (l+o) - log ^ [w k (y, z/0) 1 + 6 + w k (y,z/l) 1+6 ] (31) 


E k (o) = a - log 


Ml 


\ (y, z/0)w^y 


) 1/a] I 

, *A> | J 


O X 

It can be shown that E k (1) = E k (1) . Define further 


E k ( ct ) * 


E k '«> 

ae (0,1) 

E >> 

a > 1 


Then having assumed the expurgated exponent as the true one, we 
get for 0 < R < C [C is the capacity of the channel w (y, z/x)] 

K K k 

E k (R) = Rct 

where cr is the solution of 

R = “ E^ (cr) . (34) 

a k 

(34) thus allows us to evaluate both (3 (R) and (3 (R) provided we 

U L 

solve the equations R = E k (cr)/g . This is impractical if the 
P~ exponents are wanted for all R. In that case it is best to proceed 
parametrically with the help of the following theorem. 



Theorem 2 


Let y £ 0 be arbitrary. 


I. The ratio p^{R)/R attains the value y at the rate 


R = maxi - E (y), min I- E (\), — E (-^7 )l} 
L V m ly k +_ L y “ k + J J 


where 


k + = min^:k s 2, ^ E* (y) < * E^J) } . 


II. The ratio |3 (R)/R attains the value y at the rate 


R = min M E*( V ), E^ + (^)} 


J l l 


where 


1 = min 


in {m, min {, : 1 , 3, * E> <*) S e‘ + ,(^>1 } . 


The proof is similar to that of Theorems 3 and 4 of P] and is 
omitted. Figures 1 through 4 evaluate (3^.(R) and p^(R) vs R for 
m = oo and compare these to the exponent E (R) appropriate to 
straight Viterbi decoding. The four figures apply to the BSC with 
crossover probabilities p = 0.045, 0.056, 0.07, and 0.09, 
respectively. It should again be stressed that R is the convolutional 
rate and not the net rate. For every combination of m, N, and 
the latter curves can be obtained by replotting the present ones, 
taking into account the relationship 


^ET 


m-1 N 


R 


m N + v, 
b 



5. Simulation 


The simulated bootstrap decoding algorithm(BTDA) operates as 
follows. First, the truncated trellis algorithm is employed to decode 
each of the streams. While decoding a strearr^ if L. does not exceed 
its previous maximum within some number of time intervals THRSH, 
the decoded path will be computed by tracing back from the position 
of the maximum. The digits on the decoded path will be declared 
reliable up to the position which is located KBACK intervals earlier 
than the position of the previous maximum. 

Once a portion in a stream is declared reliable, the channel 
state modifications will be made over that portion, and the algorithm 
will go on to decode the next stream. When the m-th stream is 
encountered, first the parity relationship will be used to decode 
digits above which all the (m-1) streams are declared reliable, and 
then the truncated Viterbi decoder will be operated over the undecoded 
digits of the m-th stream. 

After decoding the m-th stream, the parity relationship will be 
used again to decode the portions where (m-1) streams are decoded 
and declared reliable. These procedures constitute the first pass 
of the algorithm. For the second pass, the last stream decoded in 
the first pass will be the first stream to be tried, and, in addition, 
the decoder will operate backwards starting from the opposite end of 
the stream. 



After decoding of a stream stops, the channel state symbols are 
modified over the reliable portion according to the definitely decoded 
digits in that stream. The encoder will go on to decode the next to 
last stream of the previous pass, and so on. Passes will continue 
until no further improvement in the length of the reliable stream 
portion can be achieved. 

Using optimization methods described in his Ph.D. thesis [4j , 

H. S. Park selected THRSH=40 and KBACK=50 for m = 10. He 

simulated the algorithm on a BSC with crossover p^ = 0.056 whose 

R = 0.45, which is the net value of the transmission rate 

comp 

(R = 9/10 R) of the convolutional code of rate 1/2. This allows 
NET ; 

comparison with the straight maximum likelihood decoding (MLDA) 


performance of R = l/2 codes over a BSC with p = 0.045. The 
following results are obtained: 


Hybrid 

BTDA 

Straight 

MLDA 

MLDA 

Equivalent 

P c 

THRSH 

KBACK p 

e 

P e 


.056 

40 

50 .00018 

.0034 

v ~ 11.5 


Table 1. 


The above table lists the constraint length v necessary for the MLDA 
algorithm to achieve the error performance p g = .00018. 

For meaningful statistical data on p g for the BTDA, the running 
time of the simulation program should be large so that the simulated 
value of p e be reliable. Due to limited computer time, only 1200 
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blocks of 10 streams were run to count decoding errors. The BTDA has 
achieved the error probability 0. 00018 for those 1200 blocks. In all, 

240 bit errors were responsible for this figure, and these were 
spread over 40 of the 1200 blocks. As many as 45 of the 240 bit errors 
occurred in a single block. To achieve more firm support for the 
value of p g , additional computer time is needed to view more of 
these occasional "large error" blocks. 


6. Computational Complexity of the BTDA 

We shall assume that the c omputational complexity of the MLDA 
is determined by 


E = (N + hl - 1)2^ (35) 

v« 

where N is the length of the information sequence and (n - 1) is the number 
pf digits defining the binary trellis states in the trellis diagram. 

In the BTDA, if we let T denote the average number of trials 
to decode m streams of the hybrid scheme, then the average number 
of trials M per decoded information stream is given by 



(36) 


where (m-1) takes accoupt of the rate reduction due to the extra 
parity stream of the hybrid scheme. 

If we assume that whenever the BTDA returns to decode a stream 
that has already been tried, decoding starts at the beginning of that 
stream, then the average number of computations E per decoded 

h 



(37) 
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information stream of the hybrid scheme is upper -bounded by 

E < M(N + jx - 1)2*\ 
ch 

From the simulation program of the BTDA (v = 10, |j. = 5), 
the number M is shown in Table 2 below. Thus 


E , < M • (N + (jl - 1) * 2 P 
ch 


1.5 • (104) * 2 5 <104 • 2 6 


H=5 


(38) 


However, as shown in the previous section, the performance 
achieved by the BTDA (v = 10, ^ = 5) is almost equivalent to the 
performance for the straight MLDA with v ~ 11, whose E^ is given by 


E , = (N + u. - 1) 
ch 


= 104 * 2 


11 


(39) 


N= 100 

ji= 11 


From Eqs. (38) and (39), the computational complexity of the BTDA 
compared to the straight MLDA is smaller by almost a factor of 


2 = 32. 


P c 

V 

F- 

THRSH 

KBACK 

MLDA 

P e Equivalent 

.056 

10 

5 

40 

50 

1.5 .00018 v > 11.5 


Table 2. 
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Figure 1: Comparison of ^(R), P^R), ^ E J R ) ex P onents 

for the BSC with crossover probability p^ = 0.045. 

Figure 2: Comparison of 6 (R), (3 (R), and E (R) exponents 

J-i U 00 

for the BSC with crossover probability p^ = 0.056. 

Figure 3; Comparison of ^(R), P^R), 2111(1 E J R ) ex P onents 
for the BSC with crossover probability p^ = 0.07. 

Figure 4: Comparison of P L (R), P^R), 311(1 E J R ) ex P° nents 
for the BSC with crossover probability p^ = 0.09. 









II- F. Three Group Bootstrap Decoding 
1. Description of Code and Its Use in Bootstrapping 

It is desirable to generalize bootstrap decoding to encode 
transmitted streams by use of an algebraic code that has more than 
one parity check. The three -goup code has two parity check digits 
V l> V 2 anc * k information digits m^, . . ., m^.* Every information 
digit is checked by at least one parity check digit. Withopt loss of 
generality let 


i- i 

'1 = 1 m s 

i= 1 
k 

’, = ) m. 
Z L i 

i=h 


1 < £ - 1 < k 


1 < h < k 


( 1 ) 


For the code to be non-trivial, 1 < h < A- 1 < k and at least one of the 

outside inequalities is strict. It is convenient to define the codeword 

digits, x , .... x as follows: 

1 k+2 

X 1 = V 1 


x. = m. 
l l-l 


X k+2 V 2 


i = 2, ...,k+l 


(2) 


The codeword digits may then be divided into three groups 

; I X h 5 ’ 4 = {x h+l x J i )> ^3 = + 


- o 


. ,,x } 
n 


where n-k+2. Let y = y . . ., y^ be the received digits, and define 
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i=l 

h 

Ay 

u i rl x i 

i=l 


l 

*2=1 y l 

l 

A V 

U 2 = L x i 

00 

i=h+l 

i=h+l 


s 'iL/ 1 

n 

A V 

3 ia^i 1 


syndrome digits of y are 

A/ 



s 1 = t 1 0 tg and 

Sg = tg © t^ 

(5) 


Let u = (vi^UgjU^), t = (t-^tgjt^), = h, rig = t~h, = n-t. Assuming 

that the information digits are i.i.d. with P(m i = 0) = pfnu =l] 

l/2, then 


P{u = 0,0,0} = P{u = 1, 1,1} = 1/2 


( 6 ) 


Now for n^ > 1, 


q n. ^ = P ^i = u i 


= P = 


an even number of n. 
digits were received 
incorrectly through 
the channel 


n. 


l+(l-2p ) 1 


(Ta) 


where p is the channel crossover probability. As a consequence, 

n. 


V (1) = p[t i t u i 


u.i = i-d-sp) 1 


(7b) 


It will prove convenient to also define 


q o (0) 2 1 


q 0 (D 2 0 


( 8 ) 


From the above, we get the relation 

pfwV * I Lv ( V v (t 2> v ( V + 


+ q (t © 1) q (t p ® 1) q (t_ © 1) 
1 ng ^ n« 0 


( 9 ) 
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where (+> denotes - mod 2 summation, and n > 1 . 

,1 “ 

We will be able to show below that for the three group oode 


p tjC/*i' P{ y i ,t 1 ,t 2 ,t 3 /x.l 

* if > 1 (10) 

p £ z 1 vt 2 J t 3 ) 

Since the left-hand ratio is the one that enters into the likelihood 
calculation for bootstrap decoding, the receiver will be interested in 
probabilities P{y i ,t 1 ,t 2 ,t 3 ^ x^^}. Suppose ie[l, . , . , 1 ^] . Then 


pfy i’ t / x.} = ^ p {yi,t,u / x.} 
u 


= l p t y i'£ / & x i^ p £)i / 


= J Pfy i ,t l / x i^ u i"5 p ^ 2 / u 2 ^ p ^ 3 / u 3 ^ p f)i / x i) (ID 

u 

ess 

But 

p [u / x.) = PO^ / X.l p{u 2 ,u 3 / u^ = p{ U;L / Xi } SCug,^) fiftyu^ (12) 
j^where 6( , ) is the Kpronecker delta function , and 

(I n i > X 

p[u l /x i 1 = J (13) 

| 6(u 1 ,x i ) n x = 1 

it 

Furthermore, for n^ > 1 (t^ and u^ are the sums over the first group excluding 
the i variable), 

p { y i>tl / x.,^1 = w( 7l / x ± ) P{t| - t x © y t / u' = u © x.) = 

= w(y. /x. ) q ^ (t x ® u x © y t © x t ) (Ut ) 

Thus it follows from (11) through ( lU ) that as lpng as n. >1. n» > 1, 

" 1 2 — ' 

^ > 1, then 
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<? V ! (ti 8 ® «j) \ (t 2® u 2 ) q »3 (t 3® a 3 ) 


u 


w(yi/Xi) 6 ( u 2 > u i) Stu^u.^) 


- \ vfy^) [q vl (t x ©y, ©*,) 1„ 2 (t 2 ) V^V + 

+ q n . x q (t £ ©l) V(* 3 ®1) ] < 15 > 


As a consequence, 


=| [pCy^t/ol + Pty^t/i}] = 

J ( [ w(y./0) q n ^. 1 (t x + + w(y i /l) q^ @ © D 


q n 3 (t 3 } 


+ [w(y./0) q n _ x (t x 0 y. ©1) + "(y./l) (* 1 ® q n 2 (t 2® X) 




qn 3 (t 3 © l) , 


1 

¥ 


[q <v q„ 2 (t 2 ) q„ 3 (t 3 ) + q^t*!® 1 > ® 11 q »3 (t 3® 1) l (l6) 


o that for n^ >1, n g > 
p ^i’S/ x i • = 


p{yi,t] 


2w ( y i/ x i > 


q n 1 -l (t l® y i €ic i ) q n 2 (t 2 ) q n 3 (t 3 ) + 

q n 1 (t l ) V/V V, ( V + (17) 



Since for n^ = l, 


p(y i »t 1 / x^u^i = 


"(y-j/x^ 


o 


if y i * *i 


if y i ¥ 'h 


then 




w(y i /x i> V< t 3® x i ) 


If y t = t 1 


0 


if y t * t 1 


Assuming the cage t @ y\ = 0, then 
q 1 (t 1 0 x L ) = w(y i /x i ) 

and t 1 0 y i Gb x^^ = x^. Thus (17) is valid if n, > 1, provided definition 
(8) is used. 

Relation (15) was obtained under the assumption that ie[l, . . ., n^]. 
If n^ 1 < i < n 3 +n 2 , we need only interchange n 1 and t ] with n g and t 2 in 
(15) • The interchange of n^ and t^ with n^ and t^ preserves the validity 
of (15) fo n^+n 2 + 1 < i < n. 

It follows from (9), (10), and (15) that if n ± > 1, the likelihood 
used in bootstrap decoding with a three group algebraic code is a function 
of y-j/X^ a nd the state variables (t^t^^n^n^n^). We will see that 
these variables will also be sufficient if all the digits of one or two 
of the three groups have been decoded. The needed adjustment of the 
state variable values as the decoding proceeds is as follows: 

At the beginning, when no digit in a column has yet been decoded. 
(15) and (16) are used directly. Suppose, w.l.o.g., that y^ 


is decoded 
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as x^. Then a new t^ = t^@x^(-j) y^ is obtained and used in ( 15 ) and 
(16) with n 1 replaced by n^l. This process continues until all digits 
of seme set l have been decoded.. W.l.o.g. assume that such a set is 

2’ new t-values are t n , t_ t q and that n and n_ digits remain 

0 0 * 3 

undecoded in^ and J) y Assuming that no error was committed, t ? = u 2 , 

and when decoding y^ for 1 < i < n^ the value of t^ becomes irrelevant 

and only those of t^ and t^ count. Thus the numerator in (10) is replaced by 

P ^ y i> / x i ,u 2 ^ = / x i ,u 2 = ^2^ = (l8a) 

= / x i ,u 1 = t 2 l = w(y. / X.) q n (t^t^c^) 

for n 1 > 1. Similarly, the denominator of (10) is replaced by 

P & y i ! u 2 “ t 2 ] = / «]. = t 2 } = § q ni (t^)t 1 ) (l8b) 

When n^ = 1, the remaining y^ can be decoded algebraically from the relation 

*i - - 1 2 us) 

We now observe from formulas '(15) and (l8a)that if in the former 
we set n 2 = 0 and use definition (8), we get the relation 

p t y i'£/ x il = | "(y^x.) q ni . 1 (t ] @t^y j ^x.) q (t 3 ©t 2 ) 

= 2 q n^ l 3 ® *2'^ P ^ y i'£/ x i> u 2^ 

Similarly, setting n ? = 0 in formula (l6), we obtain 

P[y i>^ = ¥ V^l®^ = 

= 2 q n 3 ^3 ®*2^ P[y i>ti u 2 = *2^ 


(21) 
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The ratio of probabilities (20) to (21) is thus equal to the ratio (17) and 
the latter formula thus remains valid even if one of the groups is completely- 
decoded. In fact, if n 1 = 1 and n 2 = 0, (17) becomes [note that t ] = y if 
n 1 = 1] 




1 ~ 


qi( t i® t 2> 


2 if x. = t„ 

l 2 


if f 


so that straight-forward sequential decoding using the likelihood function 
based on (17) will force the decoder to select the path on which (19) is 
satisfied at each depth. 

As seen from above, the value of t^ is irrelevant once all digits of 
2 were decoded and those of ^ are being decoded. Of course when the 
latter task is complete, decoding of 4 starts that will depend on t^ 
and tg [note that since u^&Ug = 0 then t^ = t g when and J 2 have been 
decoded] in the same way that the just described decoding was dependent -on 
t^ and tg. 

2 . Proof of Formula (10) 

Because of the symmetry of the situation, it is obviously sufficient 

to prove formula (10) for i = n. Let us define the set 

, "l n 2 n_1 

V (W u 3) = {*i, •••, x n _ 1 : V = u 1 , \ x i= u 2 , J = u 3 ) 

i=l i=n^+l i=n^+ng+l 


Ptjyx n l = w(y n /x n ) [ y — 

xe°\/(0,0,x ) j=l 


w(y ./*.) + 

3 J 
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n-1 


♦I 

xeV(l,l,x n ®L) j=l 


v( W 


( 21 *) 


Using formula ( 9 ) of Jelinek and Cocke [l] and defining 


f + (y) = w(y/0) + w(y/l) = 1 


f (y) = w(y/0) -w(y/l) = 


1 - 2p 
2p - 1 


if y = 0 
if y = 1 


(25) 


we get that 


n. 


ru 


8 


n-1 


I 

xe 'V(0,0,x n ) j«i 

(•-i 


‘1 ‘*1 

"(yj/xj) = ^ IT f + (yj) + 7 T f "(yj) 

, 1=1 1=1 


n 2 +n 2 


I IT f+ (yj) +. T T f'(yJ 


1 


n-1 


jaILj+1 


d - n - L +1 


T T «*(*, > * 

j=n 1 +n 2 +l 


n-1 


+ (-1) n 77 f-( yj ) 

j^+n^l 


- {l+(-l) 1 (l-2p) ni }{ 1+ (-l) t£ (l-2p) n2 }{l+(-l) t3+yn+X ” (1-2P)” 3 " 1 * 


= 8 V/V 

where t^t^t^ are given by (4). Similarly, 

n-1 


( 26 ) 


1 , TT "<W * v (tiS1) v ^# 1 ’ v ( ‘^„®* n « L) 

xey(l,l,x n @l) 3- ( 27 ) 



It follows therefore from (2lj) , (26), (27), and (.15) that 


Ptx./ x n l = v(y n / x n ) 2 


(t-,) q (t ) 

1 n 2 2 q n 3 -l 




= 2 "( n_2 ) 


P 'Vl/ x n! 


Averaging (28) over results in 

p W = 2_(n ' 2) p{y n .ti 

Formula (10) then follows from (28) and (29)* 


( t 3 ©y n ©t n ) 


(28) 


( 29 ) 



• Description of Likelihood Table 


Obviously, the likelihood 
log P ^/ 

p fc / *' ) 


( 30 ) 


(where x' is the vector of digits already decoded) would not actually be 
computed from scratch during the process of bootstrap decoding based on the 
three group code. Rather, the values of (30) would be stored in a table 
whose arguments would be the parameters 


X i ® n r n 2 n 3' h 


(3D 


when- h denotes the group membership of x.(i.e., x. e ^ h ), n^ denotes the 
number of digits in the j th group still left to be decoded, and t . denotes 
the adjusted mod 2 sum of the j received group (i.e., if the digits 


V 


’ > x . of 


A 


11 1 


have been decoded and y , 

** m ' 


m. 


’ ,y m are yet to be decoded 


tbent j - i l \ > 


s=l 


S=1 


The table would be computed with the help of formula (IT). Obviously, it 
would contain a lot of symmetries which could be eliminated if storage was 
a factor. For instance, the parameter h of (31) is not needed if by con- 
vention y. and x ± are always members of the first group. The likelihood 
would then be of the form 


X(x i @ s r i ,t 1 ,t 2 ,t 3 n r n 2 ,n 3 ) (32) 

with the first four parameters binary. A further reduction in storage 

size is attainable by noting that (32) is invariant to an interchange of 
(t 2 ,n 2 ) with (t 3 ,n 3 ). 
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It . Decoding Strategy of the Bootstrap Algorithm 

The convolution ally encoded streams belong to three groups. By 
convention n^ < < n^. There is a parameter KRANK (j) which ranks the 

groups in "desirability" . of decoding. At the start KRANKC (j) = J- The 
general idea is to work on all streams of KRANK(l) until they have either been 
all successfully decoded or until everyone of those streams of KRANK(l) that 
have not been decoded has been attempted (in sequence) without success. 

In the latter case streams of KRANK(2) are tried, and if this fails then 
streams of KRANK(3)- In case of such a "complete" failure, another decoding 
attempt i. made with increased values of the ISTOP and KSTACK parameters. 

As soon as any stream of some group LNOW is decoded, KRANK(l) is set 
equal to LNOW, and KRANK(2) is set equal to that remaining group that has 
the smallest number of undecoded streams. The last group is then labeled 
KRANK(3) • 

Originally, the parameter KPHASE is set equal to 1. When a group has 
been completely decoded, KPHASE is set equal to 2, KRANK(3) = LNOW, and 
KRANK(l) is set t equal to that remaining group that has the smallest number 
of undecoded streams. When two groups have been completely decoded, KPHASE 
is set equal to 3, KRANK(l) = KRANK(2), KRANK(2) = KRANK(3), KRANK(3) = LNOW. 

A decoding attempt on a , stream is "successful" if depth LTRACK was 
reached by the decoder. In this case all digits of that stream are considered 
definitely decoded. Otherwise the attempt is "unsuccessful" and digits up 
to depth IMAX - LBACK are considered definitely decoded. If a decoding error 
takes place the algorithm halts and an UNSUCCESSFUL CONCLUSION is declared. 

To aid in the understanding of the Fortran listing of the algorithm 
we give a glossary of some key parameters that are peculiar to the three-group 
bootstrap scheme. 
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LNOW - current group being decoded. 

LNOW 2, LNOW 3 - the other two groups 

KPHASE = 1 + number of completely decoded groups 

i, y. 

KRANK(j) - The J most "desirable" group. Originally LNOW = 

KRANK(l). Also if a stream is completely decoded, the 
group to which it belongs, LNOW, becomes the most 
desirable one, i.e., KRANK(l) = LNOW. The remaining 
order is that of group size if KPHASE =1. If KPHASE = 2 , 
then KRANK(2) is equal to the other undecoded group. 

KLEFT(l) - number of undecoded streams within the I— group. 

KNEXT - the order of the stream within the group LNOW which is to be 
decoded next. 

LGRP - is the order of the group currently decoded, i.e., LNOW = 
KRANK(LGRP) (l < LGRP < k -KPHASE) 

KROUND - number of streams within the group that the decoder attempted 
to decode without success since the last change of LGRP. 

LROUND - number of times LGRP attained its maximal value without the 
decoding of any of the attempted streams advancing by more 


than LBACK + bo branches . 
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5 . An Upper Bound on the Moments of the Decodin g 
Effort for Three Group Bootstrap Decoding 
The analysis of this section was developed by D. Costello 
while he was a research associate employed by the contract. 


Jelinek and Cocke 1 have developed an upper bound on the 
moments of the decoding effort for bootstrap decoding using a 
single parity stream. We will extend that analysis to the case 
of three group bootstrap decoding. Emphasis will be placed on 
those portions of the argument which differ from the original 
argument. In addition, for simplicity's sake we will restrict 
attention to the BSC . 

First of all, assume there are n i streams left to be de- 
coded in group i, i = 1, 2, 3- Then let N 1 (n) be the number of 
steps necessary to decode any given stream in gcoup i when the step 
allocation Is M — 1 and ri = (n^, n^, n^) • Applying well known 
results about ordinary sequential decoding, we can conclude that 

p l N i(£) > 4 ~ K(R,v)(r + t)4 0± (1) 


where r + t is the length of the information sequence and K(R,v) 
is a function of rate R and constraint length v which is finite 
if v is finite and satisfies 

V°i> .V 2 > 

R < — for R 2 - 3 — 


or 



for R ^ 


V 2 ) 

2 


( 2 ) 


In (2), E n (^) is the concave, positive, increasing function of 

cr-, defined as follows: 

5 
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Let k = and label the k received digits left to be 

decoded as y r y 2 , y n , y n 1+1 ’ *' y n 1 +n 2 * *•' 

y ni +n 2 +n 3 = yk ’ Define £ = < y i' y k^ and £ = ( y l' •••' y k-l) 

and assume that the k th stream is in group 3 to be decoded. Then 



since P(^|x k ) depends only on y k and the pair of syndrome (state) 
digits s = (s^ s 2 ). Noting that P(£|x k , y k >s.) = P(£|y k >iL) = 

2" ( k “3) ^ and substituting for P(y k , s|x k ) from the analysis of 
the three group code, we obtain the rather long but straight- 
forward expression 

V°3 ) “V lo ^| (1 ' p) [ <1 n 1 (0)<1 n 2 (0) %-l (0)+qn i (1)qn 2 (1)qn 5- l(1) | 3 

+ f(s ( 0 > % ( 0 ) s- l( 1 ) 4 % ( 1 ) s ( 1 , s- l( 0 j} 1/1+0 ’) 1+ ^ 
+ (( (l ' p) K (0) s ( 0 ) s- i( 1 )+qn i (1) s ( 1 > s- i(0) ] [ 5 

4 (s(% ( « > v ( ^ (i >s (i) v ( 1 ) )f lt t +05 

+ (l <1 - p) i' ln i ci)<ln2 ^ <:))qn ^- l(0)+qni(0)q n 2(1)qn ^" 1<1) 3^ 1/1+a ^ 

H l'l l/l+a^\l+a, 

9 ni (l)qn 2 ( 0 ) q n r l( 1 ) +q n :L ( 0 ) q n2 ( ;L ) q n 5 -l( 0 )]| j 

+ j| ( 1 - p) ( q n 1 ( 1 )q ns ( 0 )q n,- l( 1 )+q n 1 ( 0 )q n 2 ( 1 )q n 5 -l (0) ]| 1/1+a5 

+ f(s ( 1 ) s ( 0 ) s- i( 0 ) 4 % < 0 > s ( 1 , s- i( 1 ) j 1/1+Oj ) 1+8j 

' / ti \ 
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where p is the channel crossover probability and 


v°> = 


%P> * 


1- ( l-2i 


Clearly, E n (o ]L ) and E n (a ) are defined in a similar way. 

Next let cr^(n) be the least upper bound on the numbers 0^ 
satisfying (2), i.e., a 1 (n) is the solution of 


E n ( a i(ll)) 


E n (2) 


o ± (n 


i R < 0 


E n (2) 


= — Ef 


for 0 < R < 


E n (2) 


Now choose k(R) to be the largest positive integer such that 
k(R) o( 00) < min jmax J^(n), cr 2 (n), a^(n) j (7) 

where $ = j n = (n-^n^n^) | n-j+r^+n^ = k(R)j and o (<») is the 
Pareto exponent which would be obtained with ordinary sequenl 


is the 


Pareto exponent which would be obtained 
decoding, i.e., cr(°°) is the solution of 
„ E (a(°o)) E (2) 


ordinary sequential 


E „(2 ) < 

for — § R < C 


, , E »(2) 


for 0 < R < 


E„(2) 


where 


,(■<?) = o - log r(i- P ) i/i+ ° + p 1/1+a ] 


If there are originally m streams of digits to decode, we 
wish to modify the three group bootstrap decoding algorithm as 


follows; 
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(1) Decode m-k(R) streams by ordinary sequential decoding 
without the help of the two parity streams and with the 
step allocation M = 1. 

(2) Decode the remaining k(R) streams with the help of 
the parity streams using the three group bootstrap 
decoding algorithm. 

We now briefly highlight the arguments leading to the 
desired bound. The details will not be pursued since they 
closely follow, the development in Jelinek and Cocke‘S . In part 
(1) of the modified algorithm, the easiest m-k(R) streams are 
decoded by ordinary sequential decoding. If L* is the number 
of steps needed to decode the hardest of the decoded streams, 
then P(L* > £) is upper bounded by the probability that there 
is a set of k(R) + 1 streams that need more than £ steps each 
to decode by ordinary sequential decoding. Since the decoding 
of the first m-k(R) streams is independent, the yth computational 
moment of the decoding effort in part (1) is bounded if (k(R)+l) 


a(°°) > 7. 

In part (2) of the modified algorithm, we compute the three 
Pareto exponents a^(n), ( n), and a^(n) given that decoding 

starts in group 1, group 2, or group 3* We then begin decoding 
in the grouptwith the largest exponent. After decoding each 
stream,; this procedure is repeated, thereby assuring that each 
successive stream is easier to decode than the previous one. If 
L(k(R) ) is the number of steps needed, to decode at least one of 
the k{R) remaining streams, then pj^L(k(R)) > fj is upper bounded 
by the probability that there is a set of k(R) streams that need 
more than l steps each to decode by the three group bootstrap 
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decoding algorithm. Since the decoding of the last k(R) streams 
is not independent, the 7 th computational moment of the decoding 


effort in part ( 2 ) is bounded if max (n) , a, (n) , (n)J > 7 . 

In bounding the decoding effort for the complete modified 
algorithm, we must consider the fact that after the first m-k(R) 
streams have been decoded any of the situations in the set(^f may 
describe the distribution of the remaining k(R) streams. Since 
in part (1), we decode the m-k(R) easiest streams, we are not 
free to choose the situation which would give us the best Pareto 
exponent for part (2). Hence the worst case must be assumed, 
and the bounding condition in part ( 2 ) minimized over all 


J. 


situations in 

i 

Finally, since the decoding effort must be bounded for both 
part ( 1 ) and part ( 2 ), the 7 th computational moment of the 
decoding effort is bounded if min |(k(R)+l ) cr(«>) , m^n|max( (n) , 
(— ) J > 7. We can now summarize as follows: 

Theorem : The modified three group bootstrap decoding algorithm 

leads to a finite 7 th moment of computation per decoded digit 
if 

min £(k(R)+l) a(»), minjmax( a x (n) , a 2 (n) , a^(n))J >7 ( 10 ) 

where k(R) is the unique integer satisfying ( 7 ), a(°°) is the 
unique solution of ( 8 ), and cr^(n) is the unique solution of 

(6), i = 1, 2, 3 . 


It is necessary to derive the above bound in terms of a 
modified decoding algorithm due to the difficulties involved in 
taking the dependencies of the bootstrap algorithm into account. 
It should also be noted that this is the essential difference 
between the bounding technique in part ( 2 ) of the modified 
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algorithm and that used in part (1). In the latter case the 
decoding is independent and we were able to obtain a tight bound 
on the decoding effort. However in part (2), the decoding is 
dependent, and we were forced to upper bound the probability that 
there is a set of k(R) streams that need more than l steps to 
decode . 

Now define R^oot^) as the su P remem of ra-tes for which (10) 
is satisfied. Since the average computation will be bounded for 
the three group bootstrap decoding algorithm if R < R^ 00 ^ (1)> 
r£ ,.(1) is a lower bound on the R rt __ of this decoding scheme. 

We can evaluate R^ oot (7) b y computing the differences 


r min[max(E (c^) ,E (o g ) , E n ( a j) ) ] " ( n ). 

7 L - a x =7 - a 2 = 7 - J ^=7 J 


for k = 3,4, ... until their value becomes negative, where 
$ = |n = (n x , rig, n^) | n ± + + n^ = kj . If this takes place 


for k = k , then 
r 


KbootW**** 


^min 

y J 




T- E «$> 


( 12 ) 


✓ - 

where J = jji = (n x , rig, n^) j n 2 + + n^ = k + j . 

It remains to specify the elements of the set j/. Assume 
that m is a multiple of 3 and that the original distribution of 
the streams is n x = = n^ = m/3. The problem is to specify 

the number of ways of arranging k(R) streams into 3 groups of 
size n 1 , rig, and n^ respectively such that + rig + n^ = k(R) 
and n^ rig, and n^ are always less than or equal to m/3. We 
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will not consider a relabeling of the groups to constitute an 
additional member of since the labeling of the groups in 
the bootstrap decoding algorithm is immaterial.. 


First consider the number of ways^ of arranging 3 n^, rig 


* 


and n, such that n, + 
5 l 

n 2 + n 3 = 

on the size of the groups. We i 

k(R)+l 


2 0-3 

~m] - 1 

Jf * . J=i _ 

. 2 J 




k(R)+l 

3-3 

* _ j=l 


i k(R) + 2 
2 


k(R) + 2 
2 


k(R) + 2 


if 


3jk(R) 


(13) 


If 3 |k(R) 


where 3 | k(R) means "3 divides k(R)”, 3 |'k(R) means "3 does not 
divided k(R)", and is the largest integer less than or equal 
to I. 


Now consider the limits placed on n^, n 2 , and n viz., 
that they cannot exceed m/3 . Letting # ^ be the size of the set 
J, we arrive at the following formulas: 


Case 1 . For 1 = k(R) ^ m/3, 


i-S 


Case 2 . For m/3 < k(R.) ^ | m/2 | , 


f/£=J* -2(1+2 +...+ |M£3gS&] ) if k(R)-m/? Is even 

1 (15) 

(1+2 +...+ J | ) - k(R)-m/3+l if > ^ s odd 


where | I j is the least integer greater than or equal to I. 
Case 3 . For fm/2 j < k(R) = m, we can use the fact that#j$ is 


symmetric about m/2 since specifying the distribution of the 
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streams left to be decoded is equivalent to specifying the distribution 
of the streams already decoded. 


We will now illustrate the use of these formulas by con- 
sidering an example with m = 21 and n = (7 >7 >1) as original 
distribution of groups. 


mi 

i£ 

J 

{(0,0,1)} 

{(0,0,2); (0,1,1)} 

> ' 'l 

1 

i 

2 

2 

3 

3 

{(0,0,3)j(0,l,2); (1,1,1)/ 
f 1 

4 

4 

1(0, 0,4); (0,1,3)| (0,2,2); (1,1,2)/ 

5 

5 

{(0,0,5); (0,1,4); (0,2,3) J (1,1,3); (1,2,2) J 

6 

7 

{(0,0,6); (0,1,5); (0,2, 4); (0,3,3); (1,1,^); (1,2, 3); (2,2/)> 

7 

8 

|o, 0 , 7 ); ( 0 , 1 , 6 ); (0,2,5); (0,3,4); (1,1, 5j; |l»|,4|j 

8 

9 

{(0,1,7); (0,2,6); (0,3,5); |0,4,4|; |1,1,6|; |1,2,5|| 

9 

10 

{(0,2,7); (0,3,6);|0,4,5|;|l,l,7|;|l,|,6|;|l,3,5|j; 

10 

10 

{o,3,7); (0,4,6); j0,5 ;| |; |1,2,7|; |l,^,6|;l,4^5^j-j 

11 

10 

{(O,4,7);(0,5,6);|l,3,7j;|l,4,||;|l,5,5|;|2,2,7|,j 

12 

10 

{(0,5,7);(0,6,6)iJl : 4 ; TjsJl ; 5 ; 6Jj 

13 

9 

{(o,6,7); (1,5,7) ; (1,6,6) ; |2 ,4,^j ;|2,5,||; 

14 

8 

{(0,7,7); (1,5,7); (2,5,7); (2,6,6); |3,4,|j; |3,5,6|i 

15 

7 

{(1,7,7); (2,6,7); (3,5,7); (3,6,6); (4,4,7); (4,5,6) ;i 

(5,5*5) J 

16 

5 

{(2, 7,7); (3, 6, 7), (4, 5,7); (4,6, 6); (5,5,6) } 

17 

4 

{(3,7, 7); (4,6,7); (5,5,7); (5,6,6) } 

/• 

18 

3 

{(4,7,7); (5,6,7); (6,6,6) ")• 

19 

2 

{(5,7,7); (6,6,7)} 

20 

1 

{(6,7,7) } 



Clearly, if m is not a multiple of 5 or if the original group 
distribution is not symmetric, these formulas get more 
complicated. 

It is also helpful to have an algorithm for generating 
the members of the set ^ for a given k(R). Such an algorithm 
follows : 


(1) 

n l 

= max jo, k(R) - 2m/3j 

(2) 

*2 

= max j*n^, k(R) - m/3 - n 

(3) 

n 3 

= k(R) - n x - 

(4) 

WRITE (n x , ng, n^) 

(5) 

IP 

n, S ng + 1, 00 TO (9) 

(6) 

"2 

= ng + 1 

(7) 


= n, - 1 

5 

(8) 

GO 

TO (4) 

(9) 

n l 

= n 1 + 1 

(10) 

IF 

n 1 ^ k(R)/5, GO TO (2) 


(11) STOP 

As an aside to the above discussion, let us consider an 
alternate way of deriving a lower bound on ^oot^) for tliree 
group bootstrap decoding. We will proceed as follows: 

(1) Compute the best Pareto exponent max ( cr^ (n ) , a 2 (n), 
that can be obtained using the three group bootstrap decoding 
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algorithm starting from all possible situations n, i.e., 
all n e gf =|n = (n 1# r^, n^):n 1 + rig + n^ = k(R), 1 - k(R) - mj. 
(Note that j^is the set of all n with a fixed k(R) whereas ^ 
is the set of all n with any k(R).) 

(2) Let £f * = £n max(a 1 (n), a 2 (n), a^(n)) > 7 ^ 

(Note that if n' = (nj_, n^, n^) et/*, then any n" = (n^, n£, n^) 
which can be obtained from n ' , i.e., n£ - n^, n^' ^ ru^, and 
n^ - n^, also belongs to^J*. This saves us the task of computing 
max (cr^n), cr 2 (n), cr^(n)) for all n e^J . Also note that an n* 
with a large k(R) will in general have a smaller Pareto exponent 
than an n" with a smaller k(R) which cannot be obtained from n' 
since we would expect the parity information to speed up decoding 
more in the latter case . ) 

(5) Compute the exponent 1c(R)a(«) for ordinary sequential 
decoding which leaves the decoder in a situation n e^J . (Note 
that k(R) need not be an integer.) 

( 4 ) R boot^) ls then defined as the supremum of rates for 


which 



min £max(<r^(n) 


J 




(16) 


is satisfied. 

The main difficulty in computing this bound is in finding 
the exponent for the ordinary sequential decoding portion of the 
algorithm. Let k TO (R) be the largest value of k(R) for any 
n €*3 * and let k m j_ n ( R ) be the smallest value of k(R) for any 
n which cannot be obtained from another member of£J with 

a larger value of k(R) . Then it may appear that by suitable 
combinatorial arguments, £(R) could be shown to be in the range 
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k m i n (R) < &(R) < k m ax( R )* However l n the limit of large l s terms 
with smaller values of k(R) dominate terms with large? values of 
k(R), and hence ^(R) = ^i n (R) • Therefore the bound obtained 
using this method is the same as the original bound. 

Finally, we will say a few words about extending the results 
of this bound to other parity-check schemes. In particular, 
consider the following (n-1) x n array (n ^ 3): 


1 1 0 0 ... 0 0 0 

0 1 1 0 ... 0 0 0 

• 

0 0 0 0...0 1 i 

We can then form a parity-check matrix H for an n-group code 
by repeating each column of the above array m/n times, resulting 
in an R = m-n+l/m block code. For example, the H matrix for the 
R = 25/28 4-group code is 


111111111111110000000 0000000 
H = |0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 
0000000000000011111111111111 


(17) 

Columns 1-7 constitute group 1, columns 8-14 constitute group 2, 
etc . Note that for any given codeword, the parity of each group 
must be the same . Hence once one group is decoded correctly the 
parity of each of the other groups is known, which is a signifi- 
cant aid to finishing the decoding of the other groups. Also 
note that the row space of H(the set of all parity checks) is 



167 


completely symmetric with respect to the labeling of the groups. 
Therefore the labeling of the groups is immaterial, as was 
mentioned before in specifying the members of the set 

It should be evident that the arguments used in finding an 
upper bound on the moments of decoding effort for the 3-group 
code can be extended directly to group codes of higher order. 

The formulas for specifying the size of the set $ and the 
algorithm for generating the members of tS, however, must be 
restated for each particular case. This will be carried out upon 
successful completion of the computer calculations necessary to 
plot the bounds for the 3-group code. 

Reference 

1. F. Jelinek and J. Cocke, "Bootstrap Hybrid Decoding for 

Symmetrical Binary Input Channels," Information and Control , 
April 1971. 
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II-G. Group Code Results. Applicable to Bootstrap Decoding 

The results of this section were obtained by D . Costello 
while he was a research associate of the project. 

1. Extending the Upper Bound on the Moments 

of the Decoding Effort to n-Group Codes 

The characteristic feature of all n-group codes is that 

once the parity of any one group is decoded, the parity of 

all the other groups is immediately known to be the same. 

An n-group code contains n-1 parity checks, i.e., the H 

matrix has n-1 rows. The columns of H consist of the 

following set of n vectors of length n-1, each of which may 

appear more than once: 

1 1 0 0 0 0 

Oil 000 

0 0 1 0 0 0 

0 0 0 0 0 0 


0 

0 

0 

0 


0 0 
0 0 
0 0 
0 0 


0 

1 

1 

0 


0 

0 

1 


0 

0 

0 

1 


The number of columns in which each of these vectors appears 
determines the size of each of the n groups. For convenience 
we will assume that all groups are of the same size. 
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is the parity check matrix for an R = 9/12 4-group code. 

Note that the first row of the H matrix forces the parity 
of the first group to be the same as the parity of the second 
group, the second row of H forces the parity of the second 
group to be the same as the parity of the third group, and 
so on. Thus we get the property of group codes mentioned 
previously. Also note that all n -group codes are very high 
rate codes with minimum distance 2, i.e., they only detect 
single errors in an algebraic sense. However, this does not 
militate against their use as algebraic codes in the bootstrap 
hybrid decoding scheme. In fact, their simple structure 
makes them especially attractive for calculating the error 
exponent function. (NOTE: The word "group" here should not 

be confused with the usual notion of a group (linear) code.) 

When using group codes, once we have decoded a single 
group, the parity of the other groups is known and they can 
be decoded independently as in the single parity check case. 
Hence if we desire high rates, it is also advantagneous to 
keep the group sizes as small as possible. 

EXAMPLE 

Assume that we wish to use an algebraic code of rate 
about 9/10. With a single parity check the group size is 10. 
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With two parity checks, a three group code with R = 19/21 
has group size 7. However, we cannot continue to decrease 
the group size by increasing the numher of groups. With three 
parity checks, a four group code with R = 29/3.2 has group 
size 8. In general, we require that R = gn-(n-l)/gn = 9/10, 
where g is the group size and n is the number of groups. This 
imples that gn = (n-l)10 or lim g = 10, the same group size 

n+oo 

required by a single parity check. Clearly, for a given 
rate R, there is an optimum group number n which yields the 
smallest possible group size g. 

The derivation of the upper bound on the moments of the 
decoding effort given for the three group code can be 
extended to higher order group codes. The only difference 
is that a new algorithm is needed to generate the set S of 
possible situations in each case and the formula for the 
error exponent function E k (o) must be generalized. 

2 • A Lower Bound on the Moments of the Decoding 
Effort for Group Codes 

Proceeding analogously to the derivation of the lower 
bound on the moments of the decoding effort for the single 
parity check case, we can derive a similar lower bound for 
all group codes. In particular, for the three group code, 

R boot^ ls the inflmu m (greatest lower bound) of rates 

for which min {min [max (^(n), a 2 '(n), a (.n ) ) ] , 

[ S-, d 

max (a^Cn), (n) , o^Cn))j} ^ y , where 
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= {nrr^+r^+n^ = k} , S 3 = {n:n^+n 2 +n 3 = 3} » 

o^(n) is the solution of R = E^(a)/o , and E^.(c) is the 
error exponent function for a given situation when decoding 
begins in group i. In order to compute (y)j we must 

compute the differences 

| min [max <eJ(J>, E^X), Ej[(£))] - 

~ 7 ~ q in [ max ^ E k+l ( k+T^ J E k+l ( k+T ) » E k + l(kfe>>] 
b k+l 

for k =4,5,..., until their value becomes negative. If 
this takes place at k = k + , then 

R boot (Y) = mln l y E 3 (y)> 7 “ E , + ( J+ } 1 » 

.K iv 

where E (Y / k + ) is 
k 

min [max (E 1 (-J-) , E 2 (4— ) , E 3 .(-4))] 

S kk kk k k 

k 

And E 3 ( y) is 

min [max (E^Cy), E 2 (y), E 3 (y))] . 

S 3 

Again the extension of the lower bound to all group codes 
depends only on the generalization of the function E k (a) 
and on a new algorithm to generate the set S. 



3 . Proof That Knowledge of the Syndrome Is Equivalent 
to Knowledge of the Group Parity 
We wish to show that the ratio that enters into the 
calculation of the likelihood function, viz .P (y ,x. )/P (y ) , 
is equivalent to the ratio P (s /x ± )/P ( s ,y ) , where s is 

the syndrome sequence and the ith digit is being decoded. 
First we compute 


p(y) = Y_ p (y> x ) = Z p (y/*) p (x) = 2 ~ k Y 


n 


C C ' ~ C j=l 

where the rate of the algebraic code being used is k/n 

and C is the set of all codewords. Similarly, 


P( W 


P(y/x 1 )=P(y 1 /x 1 )P(y 1 ,...,y 1+1 ,. 

n 

=P(y i /x j ,) Y_ | 


‘ ,y n /x i) 

P( VV 


c ± j = l 


where is the set of all codewords whose ith component 
is x^ (half of the codewords in C for a linear code). 
Hence we obtain the ratio 


p (y/x, ) 
~ 1 

p <i£> 


2P(y 1 /x 1 ) 



C j=l 


P W 


P(y J /X j > 


Now the ratio P(s ,y^/x^ )/P(s ,y^ ) must be determined. 

Beginning with the denominator, we find that P(s ? y^) = 

P(y) where Y. (s) is the set of all possible received 
*7(s) ~ 

sequences y which have syndome s and whose ith component is 
y^. Since there are 2 equally likely received sequences 
corresponding to each syndrome and half of these have 



an ith component equal to y^, 

p (s,y i ) = 2^ k-1 ' ) P (y : y H = s) , 


where H is the parity check matrix. (Note that in general 

P(y) depends upon y, but that those particular received 

sequences which result in a given syndrome are all equally 

likely. For example, the set of all codewords result in the 

syndrome s = 0, and they are clearly equally likely.) But 

n 

P (y : y H = s) = 2 -k Y_ TT p (y/ x j )> 

C j=l 

where the evaluation is the same for all y that result in 

a given syndrome. Hence, 

n 


p <-i> - 1 1 TT 

c J-X 


where the products are taken for any y such that yH 

fSJ *V *V 

Since 


= s, 


p (s,y, ) 


n 


P( ? /y i ) " Ply 


TT' ' I'T 

o j-x 


P( W’ 


P(s,y 1 /x 1 ) = P(y 1 /x ± ) P(s/y 1 ,x i ) 


n 


= PCy,/*,) Y_ IT p<y j /x J ) ’ 


c, J-x 


where the products are taken for any y such that yH = s , 


Hence we obtain the ratio 
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P(g s y 1 /x 1 ) 

P(s,y ± ) 


2P (y ± /x 1 ) 


n 


V 

L _ 

C. 


j = l 
i 



c j=l 


P( W 


P( y/V 


PCy/x^^) 

P(y) 


In the case of three group bootstrap decoding, this 
result states that knowledge of the two syndrome digits 
is equivalent to knowledge of the three group parity 
digits. However, the simplest way to calculate 

use the formulas based on the three 
group parity digits, since these formulas take advantage 
of the independence among the three groups. 
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I 1 -^. Optimal Decoding of Convolutional Codes for Finite 

State Channels and its Application to Bootstrap Decoding 

1 . Introduction 

In this section we describe a method of decoding of 
convolutional codes which minimizes the per bit probability 
of error (Viterbi decoding minimizes the probability of 
erroneous codeword decoding). This method applies to all 
linear codes (see Sections 5 and 6) and can be used in 
conjunction with arbitrary discrete finite state channels. 

The complexity of the method grows as K2 U where u is either 
the constraint length of the convolutional code or the 
syndrome size of the linear code. This work was done 
jointly with L. Bahl, J. Cocke, and J. Raviv of IBM. 

While it is doubtful that one would actually wish to 
build decoders operating according to these methods, they 
can be effectively used to allow computation of optimal 
likelihood functions for the sequential decoding phase of a 
bootstrap scheme whose algebraic component is based on an 
arbitrary convolutional or linear code (see Section 7). 

Moreover, we believe that our method will make possible the 
application of bootstrapping methods to finite state channels 
such as the Gilbert burst noise channel. 



The per -bit probability of error will be minimized by 
finding the probabilities that the encoder was in a particular state at 
any time i. As a consequence, a posteriori probabilities that a 
particular digit was sent through the channel at some given time i will 
also be obtainable. 

Our method will apply to finite state channels whosq transmission 
probabilities are 


where y 



S 


Q *<V v i I v i-i* x i> 

til 

are the i — received and transmitted digits 


( 1 ) 


,th 


( and Oj are finite alphabets), and v. , v.. 


are the i— and 


(i - 1) channel states and is a finite state alphabet). The channel 
operates by the rule: 


n 


P{y 1 ,...,y n ,v 1 ,...,v n | v o ,x 1 ,...,xJ - | Q*(y i , v i |v. -1 ,x. 


i = 1 


( 2 ) 


Obviously, discrete memoryless channels are special cases of fipite 
state channels, as is, for instance, the well-known Gilbert Channel which 
has a "good" and a "bad" state with transitions that are independent of 
channel inputs. 

Since the natural transmission units of convolutional codes 
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are branches (i.e. blocks of n digits), it will be convenient to define 
special notation for these. We will let capitals refer to branches, i.e., 

X t = x tn+l’ x tn+2* ' ’ ‘ ,x (t+l)n 

( 3 ) 

Y t ~ y tn+l’ y tn+2’ - * * ,y (t+l)n 


Also, we will define a new branch transmission probability 

« <V v t | Vi-V - 


J 


y Q.*(y v n _ 1 , x (t+1)n ) Q*(y trM . 1 ,v 1 j v t _ 1 ,x tn+1 ). 


V(t+l)n ,y t 


n-1 


1 1 Q * Gw T i I v i-i’ x tofi) 


i = 2 


(V) 


where is the set of all vectors (v^Vg, * * *> v n As a result, 

P { Y 1> •**, v k I V X 1> *"' X k) = || Q ( Y i ,v il v i-l ,X i) 


i = 1 


2. Optimal Determination of Message Digits, 


Let the information blocks determining the coder output branches 
be I , I 2 , . . . (e.g. for a binary convolutional code of rate R = k/n, 

I corresponds to a block of k bits), and let the i— state of the 
encoder, S^, be given by the vector 


S. = (I., I. ,, 
l i l-l' 


L-u-S* 


(6) 
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where u is the constraint length of the code. Suppose a codeword is 
determined by T true information digits, and thus consists of T+U-l 
branches (the usual termination by u-1 dummy information 0-blocks is 
assumed). The encoder state sequence of interest the is 

S o = S l> *•*> S t+u-1 = - (7) 

If f is the code output function, then 

X t * f(I t' s t-l> (8) 

Let 

if the decoder determines the i— message 
bit incorrectly 
0 otherwise 

Then the per-input block probability of error is 

Tn Tn 

p e - f E [ I ifi]-ri E f i 00 ) 

i =1 i=l 

and so we wish to minimize E 1 /^ for all i. But for 1 < j < t, 

E [^ tn+j_ = L P { S t+l ^ | V^^^T+u-l} 

1 H 

where CL denotes the set of states S +J , with first block , (see 
> <J u+x t+1 

"th. 

(6)) whose j — digit agrees with the one actually sent. It follows 

that to minimize P g we ought to minimize the sums on the righthand side 

of (11) over all the possible sets <L . To be able to do so, we will 

w J 

find the probability terms of the sum of (11) . 
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3 • Determination of A Posteriori Encoder State Probabilities 
Let us define super- states 


u i - 

and the probability functions 

= p{u t = (i,£), Y ] , . 

X t (i,t) = p{u t Y x , . 

Now for te[l,T+u-2] 

^(i»^) = pju.£ Y^> « • 

• Hv-Vi 



(12) 


(13) 

1 U t = (i,t)} 

(Ik) 

'• ,Y Tfu-l} 

(15) 

•>*t) * 


1 U t = (i,t), Y x , . 



(16) 


and 


X T+u-l ®T+u-l 

(if U. is known, events after time t do not depend on Y, 

u 1 

We will show below that it is easy to compute a. and 0, 

t/ tJ 

In any case, it follows from (15) and (17) that 


(17) 

> * • ' 7 Y.J. ) • 

recursively. 


5 { ! 


PiS^ =i 


c l’ 


1 > X 


T+u 


J 


X t (i,t) 

l 

T+u-l^ 1 '^ 

i,l 


(18) 
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and so our task is to find X,(i,£). 

Let the initial distribution of the channel state v q be given by 


Then 


p {v o = <l}= w(q) 


= I p {u x Y 1 U o = (0, q)}w(q) 


and for t = 2, 3, . . T + u-1. 


=yp{u t _ x U t = (i ,1), Y^ . ..jJ 


J,m,q 


I P K Y t U t -1 = ( ^ m) } P { U t-l = Y l'*“'Vl} 


7 p { u t =(i^), Y t U t _ 3 = Q' t _ ] (j,m) 


where the middle equality follows from the fact that all events after time 
t-1 are independent of Y^, °nce the superstate U t _ 1 is known. Similarly, 

^+0-2^'^ = I P ( U T+u-l = ^- ,ra ^ Y T+o-2 I U t+u-2 


and for t = 1, 2, . . T + u -3, 


= L P { U t+l Y t+1’ Y T+u-1 | U t ~ = 

L P { Y t+2'***'Vu-l| U t+1 p { u t+l =(j' ra )> Y t+l| u t = 

i . m * 


p t+i kt+i = Y t+ i I u t = 



Relations (21) and (22) bear out our earlier contention that 


and P are recursively obtainable. It remains to specify the pro- 
babilities p|u^ +1 =( j,m), U^. =(i,£)]' that appear on the righthand 

sides of (20) through (23). 

Let 


H(i» J) = 



if a one step transition 

from state i to state j is possible 


otherwise 


(24) 


and let g(j) be the initial information block of the state j. Then 



= M-(ijj) Q(Y t+1 , m 


l, 


f(eti), i) p{i t+1 - g(J )} 


(25) 


In the usual situation in which all sequences are equally likely, 
p{i t+1 = b(J)} = 2_1 c . It will, however, be useful later on to have the 


general expression (25). 

We conclude this section by outlining the algorithm that will 
minimize the probability of bit error: 

1) While the sequence Y^, . . ., ^ is being received, the 

decoder computes recursively the probabilities a. (i,-t) [see (13)], using 

w 

the relations (21) and (25). The obtained values are stored for all t = 1, 
..., T+u-1 and i, -t. The amount of work involved is roughly that for 
forward Viterbi decoding. 

2) The decoder then starts computing recursively the probabilities 
& T+u-3^ i, ^ > * ' using relations (23) and (25). 

When are -computed, they and the stored Q ' T+U _2^ i, ^'^ are used 
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to obtain ^ T+U _ 2 ( i ^) [see (l6)]. The latter then replace a T+u _ 2 (i,t) 
in storage. This is done in general, x t (i,£) replacing c^fij'L) for t = 

T + u-3, T + u -4, ..., 1. The work involved in this stage of the algorithm 
is roughly equivalent to that of backward Viterbi decoding. 

3) Finally, the stored x.j.(i,£) are used to calculate pj^ = i / 
Y l' * ' '* Y T+u-ll ^ see and the quantities 


'U. (z) 


(26) 


where <&(z) is the set of states whose initial block I has its i— digit 
(i = 1, ...,k) equal to z. If 


max 


2 n,i< 2 > 


(27) 


the decoder decides that the [(t-l) n + i]— — information digit was z*. 


Unfortunately, this algorithm requires quite a large storage. 

Its size grows linearly with block length T. It is not clear with what 
accuracy it is necessary to store the values or. (i,£) and 

In conclusion, let us observe that the computation of the 
probabilities^ L see (20 ) and (21)] was based on the initial channel 

state distribution w(q). At the beginning of the communication process, w( ) 
would normally be the stationary distribution of the states. However, it 


follows from (. 13 ) that 


P { V T+u-l = 1 | Y i> * * ** Y T+u-l} 


^T+u-l^'fy 

l 


( 28 ) 



and thus the w- function for the decoding of the second block would 
naturally be given by the relation 

w(q) ' Hw V-’ Wi) (29) 

where the conditioning random variables are those received in the first 
block. The definition of the w( ) -function for the third and following 
blocks is similar. The important point is that no information about the 
starting state of any block gained through the decoding of previous blocks 
is ever lost. 



Probabilities of Transmitted Digits. 


Sometimes it is of interest to determine the probabilities 
p(Xt | . • '> y t+u _i} transmitted branch was X^, given that 

the branches Y^ . . . , Y T+U _ 1 were received (an application is given in the 
next section). We now proceed to do so. 

is fully determined by S t _ 1 and (see (8l))> so that 
x t - F ( s t-i’ s t J (30) 


Let ) be the set of all pairs S t _ 1> for which (30 ) holds. Then 



,J,Y T+u-l} ~ ^ 


= { S t-l - 3- S t ’ J 


[ 1<- 


Y, 


T+u-ll 


(i»j) e ^(X.) 


Y. Z p K-i - ° t 

(i,j)e‘5 ; (x t ) l,m 



,X 


T+u-ll 


(3D 


Therefore, it is desirable to determine the probability terms 
on the righthand side of (31). But 

P { U t-l u t = Y l'*'^ Y T+u-l} = 

P { Y t+l' **’ ,Y Tfu-l | U t = P{u t =(j,m),Y t | U t _ x = (i,t)j. 

* P { U t-l = \> •••* Y t _i} = 

* % P K 3 Y t | U t-1 = (i^)}^t-l (32) 

and from ( 15 ) and (17), 

P { Y 1>---. Y T+u-i} = I ^ Tfu-1 

i ,1 


( 33 ) 
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Combining (31) through (33) we thus get the formula 

v-Vu-J I j, wu . x d.-t) 


• i i 

(i,d)e'3 J (X t ) (l, m) 


-1 


P t (J,m) p{u t =(j,a), Y t | u t _ x = (i,t)}^ ^ (i,£) 

(3*0 



186 


5» Generalization to All Linear Codes 

The preceding results depend on the existence of the super- 
states U t whose knowledge allows the separation of past (events before 
time t) from the future (events after t). As seen from (12), U, pre- 
supposes the existence of S^, the encoder state. Our results would thus 
be generalizeable to all codes for which a state could be defined, and 
therefore a coding trellis drawn. 

Let H be the parity check matrix of a given linear (n,n-r) 
code, and let hu, i = 1, ...,n be the column vectors of H. Let c be 
a codeword. We will then define the states 3., t = 0,1, ...,n pertaining 
to as follows: 

(35) 

i = 1 

Obviously, S n = 0 and the current state S_ t is a function of the preceding 

state S t _ 1 and the current input digit c t (the relationship is tune varying!). 

Relation (35) can thus be used to draw a trellis with at most 2 r states 

S t per level. The appearance of the trellis will be similar to that for 

convolutional codes provided the vector set 4h ,h h \ has 

n'— n-1 7 n-r+U 

rank r (which can always be arranged). For binary codes, there will exist 
two transitions out of every state 3^., t= 0,1, ...,k-l, and one transition 
out of every state t = k, k+1, ...,n-l. If it turns out that {h , ...,h^} 
are linearly independent, then there will be one transition leading into 
every state S^., t = 1, 2,.,.,'t. An example of the trellis for the Hamming 


^t ' -t-1 +c t -t 


. Y 


c .h. 

l-i 


t = 1, . . «,n 


code 
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10 1110 0 
0 10 1110 
0 0 10 111 


is given in Figure 1. Unfortunately, the irregularity of the trellis is 
typical for the general case of block codes. Obviously, every transition 
corresponds to a single channel input digit only. Horizontal transitions 
(those to an identically indexed state) correspond to 0's, the remaining 
transitions to l’s. 

Viterbi decoding, as well as the methods of the preceding 
sections are clearly applicable to the trellises of linear block codes 
(it is even conceivable that sequential decoding can also be used). Since 
high-rate codes have relatively fewer states, the methods might even 
prove attractive in practice. 
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6. A "Time-Invariant" Trellis Diagram for Cyclic Codes 

The trellis diagram of Figure 1 is time dependent. 

This unfortunate feature can be eliminated when the code is 
cyclic by defining the state in terms of the shift register 
realization of the encoder rather than in terms of the 
parity check matrix. This leads to a piecewise time- 
invariant trellis diagram, as illustrated by the following 
example. 

EXAMPLE 

Consider the 3-stage shift register encoder shown in 
Figure 2 for the (7,4) Hamming code. The switches are in 
positions A for four time units and then switch t^o positions 
B for three time units. Taking the state to be the outputs 
of the three register stages, we can draw the trellis diagrams 
as in Figure 3. 

In part A of the diagram, for the states (000,110,010, 
100) up branches correspond to input 0’s and down branches 
to input l's, whereas for the states (011,101,001,111) up 
branches correspond to input l’s and down branches to input 
0's. In part B all branches correspond to input 0's. 

Note that part A and part B of the diagram, when considered 
separately, are both time-invariant, i.e., each state has 
exactly the same successors independent of time. This 
trellis diagram can be reduced to a state diagram whose 
transitions are labeled either A/B (where A is the input 
when the transition occurs in part B) or just A (where A is 
the input when the transition occurs in part A and the 
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transition does not occur in part B). For the (7,4) Hamming 
code under consideration, the state diagram is in Figure 4. 

When all the information digits have been read into the encoder 
(at the end of part A), the path back to the all-zero state 
can be determined directly from the state diagram for part A 
by merely following the path indicated by the digits of the 
present state read in reverse order. For example, if we are 
in state 100 at the end of part A, then following the path 
indicated by the digits 001 returns us to the state 000. 

This form of the encoder results in relatively simple 
state diagrams for high rate codes and relatively complex 
state diagrams for low rate codes (since the number of 
states is 2 r , where r is the number of redundant digits 
in the code). 



7 « Application to Bootstrap Decoding 


In this section we will state a particular application of the 
decoding methods of this paper to bootstrap decoding, but others are 
equally possible. Our example will be restricted to symmetrical, binary 
input channels. Consider two convolutional codes and Cg • Use 
to encode Tg blocks of information digits into Tg blocks of 

= (T^ + -l) n^ channel digits (the rate of is = k^/n.^ and 

its constraint length is u^., i = 1,2), and lay the resulting code words 
next to each other (as indicated in Figure 5 ), obtaining a binary array 
of N. rows and Tg columns. Next, take each row in the array of Figure 5 
and use C g to encode it into a codeword of Ng = (Tg + u 2 ~l) channel 
digits, and lay the resulting codewords below each other, as indicated 
in Figure 6 . The obtained binary array has N^ rows and N 2 columns . 
Because of linearity, every column in this array is a codeword of the 
code C^. 

If the digits of Figure 6 are transmitted, the received digits 

can be used to form another X Ng array whose appearance is that of 

Figure 3 * It is then possible to decode the array either row^-wise 

(using code Cg) or column-wise (using code C 1 ) on both, and to do so, any 

convenient decoder may be used. If both constraint lengths and u 0 

X ,2 

are relatively short, the methods of this paper may be used in both 
directions (see below), if u is short and long, horizontal decoding 
may be carried out with the help of a sequential decoder. 

In either case, the following interactive approach is suggested 
The array of Figure 6 is transmitted by columns, i.e., first the digits 
O- the first column in sequence, then those of the second column, etc* 
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We will assume that the state process is irreducible, and that is 
large enough relative to the memory of the state process so that the 
channel is virtually memoryless along the horizontal direction of the 
array of Figure 6 (in case this assumption is not satisfied, it is in 
principle easy to modify the following approach appropriately). 

The receiver works on the column* first, using the relations (29) to 

The aim is to obtain the 

t = 1,2, ..., T 1 +u 1 -l ( 36 ) 


determine initial state distributions . 
distributions (see Section l») 

p{* t | Y i'-">V v i} 


and 


3 { v t 


1 +U 1 _1 


Y 1'*“' Y T + u 

11 


,} 


(37) 


the latter in order to decode the next column. The probabilities ( 36 ) 
may be used to find the probabilities of transmission of individual digits 
in the various rows of the columns. 




L (t-l)n+j 


■,Y, 


T +U 
L 1 1 


-j * £ 


1 > Y, 


T +U • 
A 1 1 


■J (38) 


.th 


where the sum is over all whose j — digit is x^ ^ . 

When the work on the columns is completed, row decoding starts. 
The decoding of the r th row will utilize the probabilities pjx r j Y^, ..., 
Y T +u ll stained ? or eac h of the Ng columns. First, consider the case 
where row decoding utilizes the methods of this paper. Let q 1 ( ),..., 
q^( ) be the distributions ( 38 ) applicable to the ng digits on the 


th 

branch at depth (t+1) of the r tow. Because of our virtual independence 
assumptions, superstates can be replaced by encoder states S^, so that 
the probabilities X^.(i) [the second variable is eliminated] will be based 
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on the transition probabilities (compare with ( 25 )) 

Y t + 1 | S t =lt * 

- w (y t+1 |f ( g (j), t)) p{x t+1 = g(j)/ s t =. t} (39) 

where w ( / ) is the transmission probability of the virtually memory- 

less row channel. The probability p{l t+1 = g(j) / S t = ij is obtained 

with the help of the probabilities q, (),..., q ( ) determined by column 

”2 

decoding. In fact, let the branch digits corresponding to the transition 
g(j) out of state j be x*, ...,x* . Then 

°2 

r i J 1 ”, % (x ^ 

p K + i - eU) / S t - i} = < - =1 (1,0) 

^ «l(4> 


where the sum in the denominator is over the sequences x .... ,x 

k 1 ”2 

associated with the 2 branches leaving state i. 

The aim of row decoding is to obtain probabilities pjx r j Y^, . . ., 
y t 2+ u 2 -i;> r = ^ • • • ,Ng to be used next in column decoding based again 
on the transition probabilities p{u t+1 = (j,m), Y t+1 | U t = (i,i)} [see 
( 25 )] where formula (to) enables utilization of information gained in now 
decoding. The prccess may be iterated any number of times. The last 
iteration performs the final decoding according to the three- step algorithm 
described in Section 3« 


Let us next consider the case where the row constraint length 
°2 is large so that sequential decoding must be used. When the first 
column decoding cycle is completed, the row decoder is in possessipn 
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of probabilities p{x r Y.^ ...,Y T +u _ x } obtained by formula ( 38 ). 

Since row-memory is assumed to be practically non-existent, the usual 
sequential algorithm is carried out. The difference is that the likelihood 
functions used on the i— branch digit are given by the formula 


log 


v(y ± x t ) 

w 



( 111 ) 


where 

Wi( yi ) = £ "(yj x) q t (x) (42) 

x 

It is, of course, through formula (42) that the sequential decoder 
utilizes information gained in column decoding. Sequential decoding 
on a given row continues until that row is decoded, or until the likelihood 
drops by so much that further advance is "hopeless" (this is similar to 
the original Bootstrap Decoding Algorithm). If the decoder advanced to 
depth J, it is assumed that all digits from depth 1 through J-t [for 
some judiciously chosen t] have been definitely decoded. This means 
that for the purpose of future column decoding, the probabilities 
■p|l ^ = g(j)| S^ = i}are changed, some becoming zero [we assume that 
the sequential decoding involved row tk^ + r, re [l,2, . . .,k 1 )] . After 
row decoding has been completed, column decoding whose aim is to obtain 
new probabilities ( 38 ) is performed on those columns where change in some 
probabilities p{l t+1 = g(j) | S t = i} took place. This process is iterated 
until all rows have been completely sequentially decoded. 

Obviously, the above two applications to bo6t strapping are 
very tentative. The precise algorithms must be determined by experimentation. 



In conclusion we wish to point out, that the column code 
need not be a convolutional one. As shown in Section 5, any linear 
code is amenable to the methods of Sections 3 and k, provided its rate 
is high enough so that the number of trellis states is not excessive. 

Figure Captions : 

Trellis diagram for the (7, 4) Hamming code. 

Shift register encoder for the (7, 4) Hamming code. 

Time -invariant trellis diagram for the (7, 4) Hamming code. 
State diagram for the (7, 4) Hamming code. 

Initial convolutional encoding of T_ information digit sequences. 

Cd 

The final code block resulting from convolutional encoding 
of Nj sequences of binary code digits. 


Fig. 1: 
Fig. 2: 
Fig. 3: 
Fig. 4: 
Fig. 5: 
Fig. 6: 
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t 2 codewords 



FIG. 5 


n 2 digits per codeword 



FIG. 6 



II-I 


An Algorithm Determining Free Distance of Convolutional Codes 


The algorithm to be described here works for convolutional codes 
k 

of all rates R = — — — . However, for simplicity of exposition we will 
confine ourselves to rate l/n binary codes. 

It will be useful to take the' old.-fashioned point of view that the 
state S(t) of a convolutional encoder at time t is defined by u 
immediately preceeding information digits 

= LV 1 t-l , *'* / H-u+l] (1) 

and that the encoder output block x n = x_ , . . .x at time t is a 

'-I n 

function of S(t) only. 

If the code is non-catastrophic then the free distance d f is equal 
to the minimal weight of a codeword that corresponds to some information 
sequence of the form 

( } • • ■ > ^ m _i> 0^ 0, . . . ) (2) 

where m = 1, 2, 3,... . We will, of course, restrict our attention 
to non-catastrophic codes only (tests for possible catastrophic character 
of codes are simple). 

It follows for (2) that free distance will be achieved on a path 
defined by a state succession S(l), S(2),..., S(m+u-l), S(m+u), . . . where 

S(l) = (1,0,. ..,0) 

S(m+u-l) = (0, . . .,o,l) 

S(m+u) = S(m+u+l) = ...= (0,0,..., 0) 


(3) 
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Furthermore, S(t+l) is obtainable from S(t) by a right-shift followed 
by insertion of i^ + ^ into the leftmost state position (t=l,2, . . . ,m+u-2) 
and S(k-l) is obtainable from S(k) by a left-shift followed by insertion 
of i fc u+1 into the rightmost state position (k=m+u-l,m+u-2, . . . ,2) . 

Assume for the time being that we have the following two machines: 

a) A right-shifting machine whose starting state is (1,0, ...,0) which 

searches the trellis in the forward direction: computing outputs, 

recording their weight, adding the latter to the cumulative weight 
that corresponds to the path from the root code (l,0, ...,0) to the 
state in question, and keeping track of the states (regardless of depth) 
already visited. 

b) A left-shifting machine whose starting state is (0, . ..,0,1) 
which searches the trellis in the backward direction (again recording 
the states visited). 

If one of the machines ever reaches a state already reached by the 
other machine, then a path connection is established whose information 
digit form is that of (2) and which therefore possibly achieves free 
distance. This is the main idea of the bi-directional search for d f 
being proposed here. 

For obvious reasons of economy, both machines should extend low 
weight paths first. As a consequence, for a rate R = l/n code, the 
memory of each machine will contain at any given time only extendible 
paths whose weights are w, w+1, ..., w+n. 

Both 0 and 1 extensions, rr 0 and Tr 1 ,of a path rr ending in state 
S(t) =(i^,i^._^, . . . , i^._ u+ -]_) will be generated simultaneously. Let 
S (t+l) = (0,i t , . . .,i t _ u+2 ) and S 1 (t+l) = (l, i t , . . . , i t _ u+2 ) te the 
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last states of tP and rr^ respectively, and suppose (w.i.o.g.) that the 

* 

right-shifting machine already generated some other path tt whose end 

state was S^(t+l). if that path was previously extended, then its 

cumulative weight at that time could not have exceeded the weight of 

path tt. Hence the path it'* can he eliminated from consideration. If, 

* 0 

on the other hand^rr was not extended hy the time tt is generated, then 
0 * • 

either tt or it can he eliminated depending on which has the larger 
cumulative weight. In fact, suppose w^Att ) < w^Att ), and the left 

ii n 

shifting machine generates a path tt + whose last state is S°(t+l). Then, 

obviously, the concatenation tt°,tt + may correspond to a sequence (2) 

* + 

of least weight, hut tt ,tt cannot. We therefore conclude that at any 
given time the memory of the right-shifting machine need contain only 
paths ending in ( live paths) or leading through ( dead paths) distinct 
states. Same remarks, of course, apply to the left-shifting machine. 

As a matter of fact, when the search for d f is carried out hy a 
digital computer, no left or right-shifting machines need he simulated. 
All that is necessary is to attach a three-valued flag to each state ever 
reached from left or right. The flag’s value is ’D’ if the state was 
already extended, and it is 'R' if the state is to he extended hy a 
right-shift and it is 'L' otherwise (e.g., the flag value of S(t) when 
it was generated was 'R'. When the extensions S°(t+l) and S 1 (t+l) were 
generated, their flag values became ’R’, and the flag value of S(t) 
changed from ’R 1 to ’D’). 

We are now ready to describe the algorithm. The storage consists 

of three arrays: The first, S, gives the state, the second, F, the 

flag value, and the third W, gives the cumulative weight of the path 

* 

leading to the state S: W will denote the current upper hound on d^ . 



It will originally be set equal to nu. If T is a state, A^W will denote 
the weight of the output branch corresponding to T. 

1. Place (l,0, . ..,0) into the first S-location, 'R' into the first 
F-location, and the weight of the output of (1,0, . ..,0) into the 
first W-location. 

2. Place (0, . ..,0,1) into the second S-location, 'L' into the 
second F-location, and the weight of the output of (0,...,Q,l) into the 
second W-location. 

3. Search through memory for a non-’D’ location whose W- value is 
least. Let it be found at location J. If 2W(j) > W , go to V). 

4. Set T = S(j) and K = 0 (K is an indication whose values are C 
and 1)'. If F(J) = ’L', go to 6. 

5. Shift T right and place a 0 into the leftmost position of T. 

Go to 7. 

6. Shift T left and place a 0 into the rightmost position of T. 

7. Search through memory for some location I such that S(l) = T. 

If such I exists, go to 13 . 

8 . Find M, the first non-occupied location. Then set S(M) = T, 

W(M) = W(J) + A^W, F(M) = F(J) 

9. If K = 1, set F(j) = *D' and go to 3. 

10. If F(J) = *L', go to 12. 

11. Place a 1 into the leftmost position of T. Let K = 1. Go to 7. 

12. Place a 1 into rightmost position of T. Let K = 1. Go to 7. 

13. If F(I) ¥ 'D' go to 15. 

14. Go to 9 . 

15. If F(I) ¥ F(j) go to 18. 

16 . If W(J) + A W > w(i), go to 9. 

Purge location I, and make it available. Go to 8. 


17 . 



18* If w > W ( J ) + + W(l), set W* = W(j) + + w(l). Go to 9. 

19- The free distance is W . Stop. Figure 

1 shows the number of search steps as a function of constraint length 
u, and compares them with the number of steps involved in the conventional 
stack-type search. It is seen that on a semi-log plot, the slope 
of the latter is approximately twice that of the former. 

This is just as one would expect: each direction of search 

need now be carried out only to about half of the depth as formerly, 
and an exponentially growing tree arrangement exists in both directions. 

There is, of course, one obvious difficulty connected with this 
algorithm: the size of the storage and the search through it. To 

reduce the former would mean to change the algorithm, but an efficient 

storage organization to minimize the search is essential. If there are 
u 

2 storage locations available, then there is no problem: each possible 

state is assigned a definite address, and the algorithm simply checks 
at the appropriate address if the state in question has already been 
generated, etc. If the available storage is smaller (its minimal order 
of magnitude is a direct function of the number of search steps) a 
more efficient organization is necessary. We have tried some simple 
hashing schemes which seem to work excellently as long as the occupancy 
stays below 60 %, and we will experiment with tree arrangements involving 
pointers . 

The algorithm applies to rate R = codes as well. There are 

n 

It 

now 2(2 -l) initial states, (10. . .00. . .0) through (ll. . .10. . .0) and 
(0. . .00. . .01) through (0. . .01. . .11), and every path is extended into 
2 paths, one for each possible outgoing branch. Otherwise the algorithm 


stays the same. 





III. REPORT ON PHASE 2 


III-A . The Two-Cycle Algorithm 
1 • Introduction 

In this section we will describe the two-cycle algorithm 
and summarize our analytical results for it. A long paper by 
J.B. Anderson and F. Jelinek entitled "A Two Cycle Algorithm 
for Source Coding with a Fidelity Criterion" going into the 
details was presented at the 1972 International Symposium on 
Information Theory and will be published in the IEEE 
Transactions on Information Theory. 

In the 2-cycle algorithm, the encoder will work in two 
fundamental modes, called cycles, one embedded within the 
other. In the first mode a search is made among tree paths 
to find feasible candidates for encoding of the generated 
information. In the second mode, the candidates are 
concatenated with the help of a push-down stack. 

The operation is, in a way, not too different from that suggested 
in Jelinek *s original proof of the three coding theorem. What 
makes analytical evaluation possible and the algorithm de- 
sirable (from an encoding effort standpoint) are the 
kinds of stopping rules introduced to limit the amount of 
work in each mode. 

Assume that code words for encoding of a binary digit 
IID source 
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have been arranged, in a tree structure. The tree has rate R = log 2 d/n, 

with d branches stemming from each node and n source approximating 

binary digits on each branch. The object of an encoder is to find a 

path of branches through the tree, the digits of which approximate the 

source sufficiently closely. To measure distance between the source 

output and various paths, we use the Hammins measure 

l 

» !') » l [i - «(v s i>] a) 

i = 1 

where z is a source sequence, z is an hypothesized path, (both of length 
i) and 6 is the Kronecker delta function. It should be stressed that 
our encoder works for other measures and sources as well. 

Goodness of individual paths depends on path length as well as 
distortion and is compared by the algorithm with the help of a path 
metric , 

p(z^) = iD* - d(z* , z^) (2) 

Since a path involves an integral multiple of branches to be of interest, 
l is assumed to be a multiple of n. D* is the target distortion per 
encoded source digit desired at the end of encoding, and D* > A(r), 
the inverse rate distortion function relative to (l) and the source. 

With this path metric in mind, we define two freezing barriers 
(in the terminology of Gallager), one at metric a > 0, the other at 
b < 0. Further extension of paths whose metrics rise above a will be 
frozen temporarily and the paths removed to the push-down stack, (these 
are the live links ) while paths falling below b will be dropped entirely. 

A precise description of the algorithm follows: 
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Starting at the code tree root node (which is 

I 

assigned the metric zero), a freezing cycle is 
performed: Paths are extended in an exhaustive search 

until all root node descendants crash a freezing barrier 
and are frozen. Those paths that rise above the a barrier 


are placed at the top of a push-dowr. stack. 


When a freezing cycle terminates, attention turns to 
the push-down stack. The final node of the path at the 
top of the stack now becomes a root node (metric value 0 
assigned) for a new freezing cycle, and the encoder exe- 
cutes again Step (l) . As described in Step (3), the top 
stack path may occasionally be saved. If the stack is 


either empty, or its top contains a path made up of a con- 
catenation of L links from Step (l), the encoder passes 
to Step (3) . * 



* The push-down stack requires no sorting effort, since paths are 
inserted as they come and are removed at the top. The resulting 
stack of paths is thus naturally ordered by the number of live 
links each path consists of, the longest (in terms of links, but 
not necessarily branches) being on top. To order paths according 
to branch lengths is another possibility that may involve extra sorting 
work. We do not know how to take proper analytical advantage of such 
an improvement. The fastest way to carry out the freezing cycle would 
seem to be a Fano-type search that would take the O-branch extension 
first until freezing is achieved, and then backstack. In this way 
the ordering of live links within each freezing cycle would be lexical. 
If, on the other hand, all extensions were to be carried out by depth, 
then the links would be inserted into the stack in the desirable 
branch-length order. 



Step (3) 


When the push-down cycle defined by Steps (l) and (2) 


terminates, the encoder releases the output to the user. 

If an L-concatenation has appeared, it is released directly, 
and an L- termination is said to have occurred. In the 
event of an empty stack, the push-down cycle has terminated 
by extinction . To defend against this, the encoder keeps 
track of the longest concatenation found by the push-down 
cycle and returns to it if extinction occurs. Step (l) 
is performed for the second time beginning on the last node 
of this path. The first frozen path encountered (it must 
be at barrier bill) is then concatenated with the saved 
path and released as the codeword to the user. 

Step (IQ When an encoding takes place, the push-down stack is 

purged and the last node of the obtained codeword is 
inserted into the stack. The latter then constitutes 
a new root node for further operation of the encoding 
algorithm. 

Step (l) constitutes the freezing cycle, and Steps (l) and (2) 
together are the push-down cycle. Step ( 5 ) implies release of accumulated 
output, and the time between successive executions of this step is the 
delay in encoding. The analysis of our algorithm is an interesting one 
in itself, but the scheme has several practical advantages. The freezing 
cycle need not be extensive, and far less time is spent scrutinizing 
codewords than with the Jelinek stack algorithm. In general, efficiency 
and simplicity are well combined. 

Before proceeding with an analysis, we pause to develop further 
terminology and identify quantities of interest. The language of tree 



structures is well suited to our discussion, except that the two-cycle 
algorithm contains two tree structures, one "within" the other, which 
are easily confused. Accordingly, let the code tree paths be made up 
of branches, of which d stem from each node, but let the tree structure 
diagramming the push-down stack development consist of links . In this 
tree, sons of a node are formed by a freezing cycle, their number being 
a random variable, and paths of links represent concatenations of the 
"good paths" alluded to above. Corresponding to each link is a link 
length in branches of the code tree, and a stack tree node has sons 
equal in number to the code tree paths frozen at a during some freezing 
cycle. The subject of code trees is well known, and the growth of the stack 
tree, a process we call a push-down stack searched branching process , 
will be estimated in Section: 3. The process terminates either by 
extinction, or by L- termination. 

2. Quantities of Interest in the Two-Cycle Algorithm 

We now discuss quantities of interest in the operation of the 
two-cycle algorithm: Computation per source digit encoded, computation 

per freezing cycle, freezing cycles per push-down cycle, probability 
of termination by extinction, concentration of work in one or the other 
cycles, and of course, the distortion attained. All of these eventually 
must depend on the three parameters of the algorithm, a,b, and L. 

Let the term live link refer to an a-frozen link, and dead link 
to the occasional b-frozen link (recall the push-down process involves 
a-frozen links only). Let the path that constitutes the codeword 
released to the user be referred to as the chosen link path . Let the 
latter be of length l, and let X ± be the branch length of the i th link. 

Let Y be the branch length of the last (and only!) dead link, if any. 
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of the chosen path. Then the chosen path branch length M :is given by 
(l is a random variable not exceeding L) 
l 

M = ' L X ± + Y [l-6(i,L)] (3) 

i = l 

and the total distortion incurred in encoding is 

D Tot = MI - ia - [l-6(i,L)]b , b < 0 (4) 


Let W be the computation performed in the code tree during the i^* 1 

freezing cycle, and let V be the number of freezing cycles necessary 

to complete a push-down cycle. Then U, the total computation expended 

in a push-down cycle, is 
V 


u 



(5) 


Among our interests is the relation between the average distortion 

per encoded source digit E i P Tot ~l } an d the average work per 

L M J 

encoded source digit E jj-~- j . Under suitable conditions, satisfied 


in this case. 


El 


D, 

LIT 


Tot 


E[D Tot ] 

e[m] 


( 6 ) 



E[U] 

eLmj' 


(7) 


Let be the probability that the push-down cycle terminates by 
extinction before any link on tree level i has been generated. Clearly 

q i - %+! let 


q = lim q_^ 

i --T CE 



It can be shown that a proper choice of a > o and b < 0 results in 
q < 1. Assuming that to be the case, let us choose L to satisfy 



Then from (4) and (6) 


(8) 


But 


E i D Tot 
L M 


D* 


aE [l] + bE [1-6(1,!)] 

e[m] 


E Cl-8(i,L)] = q L 

and 

L-l 

E[i] = I ( Vi - y) £ + (i-q L )L 

f= l 


Hence, using (8), (10), and (ll) 


(9) 


( 10 ) 


(ii) 


-aE[f] - bE[l-6(i,L)] < 

< -a (l-q_)L + aL -(l"^) 

L q 

= aL ( q L ~ q ) < 0 

q 

It follows that L chosen as in (8) 

I “I 

E [__Tot j < D* 

M 


( 12 ) 


causes 

(13) 


Ifextj the computation in successive freezing cycles is ind.epend.ent 



under our assumptions, so by Wald's Lemma, 


E[U] = e[v] e[w] 


and 


E[M] = E[l] E[X] + q L E[Y.l 
>(1 -q) L E[X] 
where we have made use of (ll) . 


Hence 


E 


I E'l 

L M J 


< 


e[v] e[w] 

L(l-q) E[X] 


( 1 *) 


( 15 ) 


(16) 


A characteristic of push-down stack searched branching processes 
is that the underbound of (ll) is quite tight, so that the bounds 
( 13 ) and (16) are also tight. Thus, (16) gives the computation 
required to produce distortion D*. 

Since q is a function of a and b only, then a, b, and L are all 
implicitly present in (l6) . It turns out that certain choices of 
a, b, and L decrease the computation in one cycle at the expense of 
the other (e.g., smaller freezing cycles, but more of them, or vice 
versa) . Obviously, some combination minimizes the bound (l6) while 
preserving the validity of (13). To complete our analysis, we must 
study 

i) ELW], the expected number of computations in a freezing cycle 



ii) E[v], the expected number of freezing cycles in a push-down cycle 

iii) q, the probability of extinction in a push-down cycle that 
has L = ( ° 

iv) e[x], the expected branch length of a live link 

v) Choices of a, b, and L 


3. Summary of Analytical Results for the Two-Cycle Algorithm 

For this progress report, we summarize briefly the analytical 
results that have been obtained to this date. Only the simplest equations 
and no proofs will be given. A full length report on the two-cycle 
algorithm will be forthcoming. 

i) Expected freezing cycle computations 

In the code tree, let 

N a = Number of paths frozen at a-barrier (i.e., live links) 

N^ = Number of paths frozen at b-barrier (i.e., dead links) 

N a .' = Number of paths remaining forever unfrozen 


Then the following theorem is true: 

Theorem 1 For a tree with rate B = log 2 d/n used to encode binary 
I ID sources with, respect to the Hamming distortion measure. 
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whenever b - a < tt/w. s = r e is the possibly complex 
solution to 


2 1-R „ g D*-l + S D* 


(18) 


w and r are functions of D* and R only. 

w\0 as A(E) and r is typically near (l-D*)/D*. A careful 

look at (17) reveals that as | b-a | tends to tt/w, both EN and EN, 

3. D 

tend to infinity. In fact, given an a one may choose b to make the 
right hand side of (17a) precisely unity. In this way, R, D*, and a 

specify a minimal b necessary to achieve EN > 1. We can state this 

— a 

as a 

Corollary 1 For any given a < tt/w, there exists b* such that if | b-a 
< T]/w and b < b*, then EN >1 

3 . 

As a rule, b* is very near tt/w. A second corollary will give us 
the desired result for E[w]. As is customary, let one computation 
include the generation and scrutiny of d branches stemming from their 
common parent node. Then an exercise in tree branch topology yields 

Coro llar y 2 E[w] = + ®^b ^ 

d-1 


The significance of EN^ > .1 is given by Theorem 2, which amounts 

to a coding theorem proved by the device of a two-cycle algorithm: 

Theorem 2 Under the hypotheses of Theorem 1, whenever EN > 1 and 

a 

D* > A(r), the two-cycle algorithm along with some source code 
will perform arbitrarily close to D* for some L. 


ii) Expected freezing cycles per push-down cycle 


An effective means of analysis has been found for the push-down 



stack, that shows, among other things, the surprising theorem to follow. 
Let the distribution [ ^ be. .defined by 


P 


k 


Pr IN = k] = probability of k sons of a stack 

tree node 


Theorem 3 For any distribution such that Ek > 1 (i.e., 

EN a > the ex P ec ' fced number of son formations, e[v], necessary 
to terminate a push-down cycle is overbounded by 

E[V] < L (19) 

(Recall that L is the termination depth of the cycle when extinction does 
not occur, but the expectation is over either termination). 

We conjecture that (19) is a tight overbound. 

iii) Probability of push-down cycle extinction 

In the event of extinction, the push-down cycle behaves identically 

to an ordinary branching process. Exploiting this relationship gives q. 

In particular, whenever EN < 1, the monotone increasing sequence {q.} 

a i 

has limit 1, so that for large L, extinction occurs with probability 
1. When EN^ >1, q is the solution of the polynomial equation 

CO 

1 = £ P k ^ (20) 

k = 0 

It remains only to find the distribution {p } , and it turns out that 
each P k (a,b) is the solution of a linear difference equation with non- 
constant coefficients. These equations are easy to solve numerically, 
although much more complicated analytical methods are available also. 
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iv) Expected length of live links 

■Recursions are now available to find the expected length of a live 
path searched out by the freezing cycle. These recursions allow also 
the study of freezing cycles with a length restriction on searching 
in the code tree. Such a feature is important as a practical matter 
to insure the steady operation of the encoder. For lack of time, 
numerical analysis of these recursions has not as yet been undertaken. 

v) Choices of a, b, L 

Intensive work on this problem is awaiting further numerical 
analysis. Increasing a will increase q and increasing |b | will have 
the opposite effect. Simultaneous increase in a and | b | will increase 
E[W] but might conceivably decrease L (see (8)). The point is that the 
amount of work in the push-down cycle might be traded for work in the 
freezing cycle, and there will exist some optimal balance that we shall 
seek to discover. 



III-B. The Stack Algorithm for Source Coding 


The stack algorithm is a scheme that uses tree codes to 
encode source data with respect to a fidelity criterion. It 
stems directly from the Jelinek stack algorithm [lj for sequen- 
tial channel decoding, but differs radically in its analysis. 

In terms of code tree branches searched per digit output, it is 
the most efficient algorithm known to the authors (see [2] ,[33* 
[4j) . The algorithm suffers, however, from clumsy data handling 
and large storage. 

The stack algorithm is simple to describe and consists of 
one repeated basic operation, the stack augmentation . Hypothet- 
ical code tree paths £ k of varying lengths k, ordered by the 
usual metric 

t(z k ) = kD* - dU k ,z k ) (1) 

reside in a stack. Prom the top path in the stack, the d bran- 
ches stemming from its final node are extended to form d new 
paths. Stacking these in order of metric, the algorithm com- 
pletes an augmentation. Repetition continues until a stopping 
rule intervenes . 

Suppose the algorithm stops and releases output when a 
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path exceeds metric A>0 for the first time, that is, when 
the "top” of the ordered stack exceeds A. We can imagine a 
bottom limit B<0 below which all paths are dropped from the 
stack, and a limit t on the length of tree paths stored in the 
stack. Our analysis is sufficient for this generality, but for 
simplicity consider a stack of infinite capacity to store nodes, 
with B = - 00 and t = « . With these assumptions, the average 

stack storage in branches is identical to the expected number of 
nodes scrutinized by the algorithm, since no paths are ever 
dropped. Furthermore, if this expectation is EN(A,B) — with B 
= - co -- then the number of nodes searched per branch released 
as output, over many stack searches, is 

E [Nodes per branch] = EN(A„ -°°)/EL (2) 

where SL is the expected length of a released path. The expected 
distortion of this path will depend on A as well as D*, and is 

E[jDist. per branch] ~ nD* - ( 3 ) 

Similar, but more complicated, equations hold if B and t are 
not indefinitely large. 

Our analytical method is to Identify the tree search with 
linear and non-linear difference equations, and then approximate 

these. The non-linear equations predominate, unfortunately, and 

\ 

the stack sorting will require a careful mathematical model. 
Quantities needed will be the average nodes searched EN(A,B), 
the average length releases EL, and the probability distribution 
of the top -of -stack minimum ( TSM ) » The latter describes how low 
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the metric of the best stack path drops before some path is 
finally released. 

Define the function G(y) by means of its d th power to be 


G (y) = P) Forward of some node n 0 

l TSM 

B< y< A 


£< n o> * y] 


w 


n 0 can be any node encountered during the stack search, and 
p(n Q ) represents the value of (1) at that node. Then one can 
show that G( ) satisfies the non-linear difference equation with 
constant coefficients. 


G(y) = 21 P(P m ) G d (y+u ) , B < y< A (5) 

m 

ls the set of metric increments that can appear in 

the tree code, and p( ) is their distribution. G( ) gives the 
distribution of the stack top, but turns out to be far more 
important than that. As we shall now see, every stack quantity 
is directly related to G( ), and the study of the algorithm con- 
sists almost entirely of manipulating this function. 

After a careful derivation, taking into account the staok 
sorting, one gets that 

3 

en(a,b) = 2Zm(J/A+J) (6) 

3-1 

where the family of functions ^M(*/i)| are solutions of linear 
difference equations with non-constant coefficients of the form 

M (y/i ) = d G d “ 1 (y/i)^p( M;m ) M(y+ n m /l) + C(y/i) 

in 

(7) 

An equation (7) exists for each 1, 1=1, ...,I(A,B). I(A,B) is a 



finite integer function of A and B, All I(A,B) solutions are 
needed to compute (6). C(*/i) is calculated from G( ) functions 
G( s /i) is the solution of (5) for certain boundaries specified 
by i. 

A final derivation yields that 

co 

EL = I G d (0) (8) 

1=0 

assuming the stack search begins at a root node with metric 0. 
The )] are obtained from iterations of the recursion 

My) = 21 P^m* G f-l(y +|i m > » B<y<A (9) 

m 

G Q = 1 {Boundaries as in (5)) 

which provides incidentally a numerical means to solve (5), 
since it can be shown Gjj(y) ^ G(y) . 

Using these equations (4)-(9), extensive numerical studies 
have been conducted for a stack algorithm using a randomly 
chosen tree code to encode the binary IID source with Hamming 
fidelity criterion d(z,z) = 1 - S(z,z) . In addition, a FORTRAN 
stack encoder has simulated the same situation. To summarize 
these results, observe that distortion is a function of both D* 
and A. If one optimises A and D* for smallest storage, A will 
be as small as possible, with D* as a consequenoe very near the 
distortion desired from the algorithm. On the other hand, opti- 
mizing with respect to branch computation requires a larger A 
and a D* somewhat above the final distortion. 

It turns out that the stack search involves by far fewer 
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tree branches per digit released as output than any other scheme 
studied by the authors [2], [3]. But this strong advantage is 
balanced by several disadvantages. Both computation and length 
released vary widely from search to search, and storage is large. 

A difficulty of another sort, encountered during simulation, is 
sorting effort. After each augmentation, d new paths must be 
sorted into the stack in order of metric, and among paths of the 
same metric, in order of length. In general, this is not easily 
done. New paths typically are inserted far down into the stack, 
particularly if some of the branch increments are reasonably 
negative, since many other paths usually have metrics nearer the 
best. 

Overall, it appears that the efficiency in branches 
studied is overbalanced by this clumsy sorting. Algorithms such 
as the M-algorithm [2] and the 2-cycle algorithm [3] have proved 
faster in simulation thus far, and simpler to implement. But 
improvements in all the algorithms are always a possibility, 
and the subject is not closed. 
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III-C. Development of a Stack Algorithm for Tree Encoding of a 
Gaussian Source with a Mean Square Fidelity Criterion 

1. Introduction 

The most general theoretical formulation of the data compression 
problem was provided by Shannon in 1959 in his paper "Coding Theorems 
for a Discrete Source with a Fidelity Criterion" [1], He enlarged there 
on his 1949 source coding ideas [2] referred to in the literature as 
variable length source coding and block source coding. Concisely 
stated. Shannon's results are as follows: let a memoryless source 
of alphabet A = (0, 1, . . . , a-1) governed by the probability distribution 
Q(z), z e A be given. Let an approximation of the source outputs in the 
reproducer alphabet B = (0, 1, ..., b-1) be desired ( in practice b < a ) 
with an attached additive per letter distortion criterion d(z, z) defined for 
all pairs z®A, zgB. (i. e. , the distortion between sequences 

n 

A n A a 

z = Z , . . . , z and z = z , . . . , z is defined to be d(z ;z ) = d(z., z. )). 

ill 1X1 rsJ 11 

n i=l 

Let y^( :z ) be an encoding function that assigns some reproducer 

*n n 

sequence :z to each possible source sequence z . The rate of the 

resultant code is defined to be R = log y /n where y denotes the 

n n 

number of sequences in the range of ¥ ( ). Shannon shows the 

n 

existence of a rate distortion function R(D) [whose shape depends on 

Q( ) an d d( , ) only] that has the following properties: 

a) for all n and all codes y , if R<R(D) then the expected 

n 

distortion E[— d(z n ; y (z 11 ))] > D. 

n ~ n ~ J 
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b) for R s R(D) there exists a sequence of codes 

Y* of rate log Y* / n < R(D) such that 
n n 

E4d( Z n ; Y* (z n )) ] -» D. 
n ~ n ~ 


In recent years much work has been done generalizing the above 
results to a broader class of sources, evaluating the performance of existing 
systems relative to the achievable optimum, and developing methods 
for evaluation of the R(D) function. The first consideration of the 
actual coding problem was undertaken by Jelinek [3] who showed that 
the sequence of coding functions Y* can possess the above desirable 
properties even if it is restricted to generate tree codes (instead of 
block codes to which Shannon's theorem applies). It was hoped that a 
tree code structure would facilitate the development of computationally 
feasible encoding algorithms. 

The present report concerns the performance of two such algorithms 
as applied to the restricted case of the time discrete Gaussian memoryless 
source [with probability density 

Cd 

-x 


1 2 

Q(x) = e , x real ] 

and the squared error criterion [d(z,z) = (z - 2) ] . 

For this case the R(D) function is R = - -j-log D. Furthermore, 

for this case it can be shown that any sequence of codes Y* with rates 

n 

log y* /n -» R(D) and distortions E[d( z n , Y^ (z n ))/n] D mast have the 


n 


average conditional distortions 
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n 

IT Y> E[d( v r i (zI1)) ! z k = x] -» D(1 - °) + 1)2x2 

k=l 

almost everywhere in x where z^_ is the kth element of z 11 and 

v* , (z 11 ) is the kth element of Y*(z n ) # 
n, k ~ n ~ 

An example of a tree code with 4 branches per node and two initial 

states is given in Figure 1 . The various codewords are the sequences 

2 

associated with the 2x4 =32 different paths of the tree. For a tree 


with b^ initial states and b branches per node a path of length a is 

Z 

specified by a map sequence s = (s^Sj,...^^) where the s.'s are 

non-negative integers, s^ < b Q - 1 , s.. < b - 1 fori = 1 , 2 ,..., 4 . 

This map sequence determines which initial state was taken and at each 

th 

node level determines if the first (0), second (1 ),... , orb (b-1) 

branch was taken. Thus for the tree of Figure 1 the map sequence 

2 2 

s =112 corresponds to the codeword z = (-0.87, 0.60). The rate 
r **' 

of the code of Figure 1 is R = | log 32 = 2.5 bits. 

A convenient method of filling the tree is by means of a finite 

state tree encoder . In this method each branch in the tree is 

associated with a state as follows: branch s. of path s^ = (s , s , . . . , s.) 

J ~ U I J 

is assigned state U(j), t(s J )), where time state z (j) = j (modulo r) 


and branch state 


■j 

t(s^) = (s b^ + V* s.b’* )(modulo 
~ o L—i 1 


and the period r and nurrber of branch states m are positive integers. 
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Then each state is given an element of the reproducer alphabet and 
each branch is given the element assigned to its state. An example of 
a finite state tree encoder with r = 2 and m = 8 is shown in Table 1. This 
code gives the tree of Figure 1 when used to fill a tree with b =2 
and b = 4. For example, path 112 has states 

(1 (modulo 2), (1x4+ 1) (modulo 8)) = (1,5) and 
(2 (modulo 2), (1x4^ +1x4 + 2) (modulo 8)) = (0,6) 
and therefore has the codeword (-0. 87, 0.60). 

It is not known how to find the best code given R, D, r,m. However, 

it can be shown that for a tree with b = R(D) branches per node, if the 

states are assigned real numbers independently at random with 

1 2 ^ 

probability density P(z) = -s=^=gj- ex p[ “ 2~yT = D^ ^ * t ^ ien 

probability one in the limit of large r, m and large tree depth the 

resulting code is optimal in the following sense: the expectation over 

all source output sequences of the average distortion along the best 

path for each source sequence is arbitrarily close to D . 

A question still remaining is how to search the tree efficiently to 

find good paths. Two algorithms for doing this will now be described. 

Since t(s^) = (t(s^ ^) x b + s.) (modulo m), the state of a branch 
~ ~ J 

determines the states of all branches deriving from it. Consequently, 
branches at the same level with the same state are identical for 
coding purposes. Thus for example in the tree code of Figure 1 , 



State 

Representation 

State 

Representation 

(0,0) 

-0. 72 

(1,0) 

0.38 

(0, 1) 

0.30 

(1,1) 

-0.69 

(0,2) 

1.38 

(1,2) 

-0.97 

(0,3) 

-0.32 

(1,3) 

0. 76 

(0,4) 

1.32 

(1,4) 

1.32 

(0,5) 

-0.92 

(1,5) 

r- 

00 

• 

o 

1 

(0,6) 

0. 60 

(1,6) 

0.37 

(0,7) 

-1.28 

(1.7) 

0. 10 


Table 1. 


An example of a finite state tree code 
with period r = 2 and number of 


branch states m = 8. 



paths 012, 032, 112, and 132 all have state (0,6) at level 2 and are 
therefore equivalent there. Thus for a memoryless source a choice 
from any set of paths in encoding a given source output should depend 
only on their distortions up until the time they reach the same state. 

This property is used by an exhaustive search algorithm 
known as the Viterbi algorithm: encoder states are grouped into 
equivalence classes T. defined by T. = jt:bt = i (modulo m)\, i = 0,1, . . .,m 

The algorithm proceeds by successive elimination and operates with all 
paths of the same length. 

All one branch extensions of all paths still being considered are 
found and their distortions are computed. For each i, all paths 
ending m states in class T. are compared and all but the one with the 
smallest total distortion are eliminated fr om further consideration. 

This process is repeated for each level until a given stopping level is 
reached. Then all remaining paths are compared and the one with the 
smallest total distortion is chosen to be the encoder output. 

Another search algorithm, known as the stack encoding algorithm [4], 
operates as follows: 

Let D* be th e per letter distortion desired by the user. To be 
realistic (see the previously quoted results) we must have R > R(D*). 

Define a metric distortion function d*( z , £) = d (z,£) _ (A + Bz 2 ) where 
A + B = D* are parameters to be adjusted. For example, a choice 
of metric matched to the limit of the performance of the best 


possible 
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codes would be R = R(D-), A = D*(l - D*), B = (D*) . Then z 1 

will be an acceptable approximation of a source sequence z 1 if and 
only if 


L 


j=l 


d*(z , z . ) < 0 
J J 


(we assume that the code is indefinitely extensible, i. e. , that the 
number of levels in the tree is practically infinite). Suppose the 
sequence z 11 (n large) was generated by the source, let d*(s J ) denote the 

r\s /v 

metric relative to z 11 corresponding to the last branch of the path 
s^ [e.g., d*(112) = d*(z 2 , 0. 60) and d*( 1 13) = d*(z , - 1 . 28) fo r the code 
of Figure 1 ], and let D(s^) be the cumulative metric along the path s'* . 

/v rv 

i i i 

D(^s J ) = £ d*(s ) where are the initial subsequences of length i of 

s 1 (i < j). The stack will contain different paths s^ and their cumulative 

metrics D(s^), and will be arranged in ascending order of the latter 

(i.e. , at the top of the stack there will be that path s^ whose D(s^) is least). 

1. At the beginning of the encoding process, the paths 0,1, ... ,b - 1 
are assigned zero cumulative distortion and arranged in the stack in any 
order (e.g., numerical order). 

2. The encoder checks whether the path s^ on top of the stack is 
such that j is greater than some stopping value. If so, go to step 4, if 
not, go to step 3. 

3. The top entry [s^, D(sr')] is eliminated from the stack, the 
branch metrics d*(s^0), d*(s^l), . . . , d*(s^(b - 1)) are computed, and b 



b - 1 are 


new entries [s^k, D(s^k) = DCs" 1 ) + d*(s^k)i k = 0,1,..., 

/V /W A/ 

inserted in the proper locations in the stack. Go to 2. 

4. The sequence t ? is encoded into the codeword 7 } that corresponds 

a/ 

to the path s'*. Stop. 

/v 

2 . Results 

The basic algorithms were modified in several ways in the 
computer programs to simulate the encoding. A modification applying 
to both the Viterbi and stack algorithms was that data (source outputs) of 
magnitude greater than a certain cutoff £ were encoded separately, using 
one quantization region for each tail of the Gaussian distribution. 

The additional coding needed to code extreme data separately 
requires on the average rate R = H-(§(c) - §(-c), 1 - $ (c), 1 - $ (c^ 
where H is the entropy function defined by 

H {Pi} = X Pi lo 8 P^* 

i 

Overall rate R is then 

R = R + [«(C) - §(-c)]R^ 
c t 

where R^ is the tree coding rate. 

For D the expected distortion of the extreme source values and D the 
c t 

average distortion of tree coded source values, overall distortion D is 


given by 
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It was determined experimentally that for both Viterbi and stack algorithms 
the cutoff c should be in the region of 3 . 5 to 4 source standard deviations. 

The Viterbi algorithm with data cutoff 3.5 was simulated in IBM 

System 360 assembler language. It was run on 60 blocks of length 250 

source outputs each, with period r a 250, that is, with branches of the 

code tree at different depths being assigned numbers independently, m 

was 16, 384, b and b were 32. Overall rate R was thus about 5 bits per 
o 

source output. 

As given above, the lower limit of possible rate R versus distortion 
D performance is given by 

R = - ^log 2 D or D = 2” 2R • 

The Viterbi algorithm simulation just described was found to operate 
. -2R 

at an overall distortion D =1.31 (2 ). Because doing this required a 

search of about 16 thousand branches per datum encoded, the simulation 
could process only about 2 data per second. 

Stack algorithm modifications were as follows: 

(a) The branches coming out of a node were grouped together and 
put as a group into the stack according to the best cumulative 
distortion metric of the group. When the group arrives at the top 
of the stack its best branch is removed and extended and the group 
is re-entered in the stack according to the best cumulative distortion 
metric of the paths remaining in it. 



(b) Whenever the stack contained more than 3,000 path groups. 


the group at the bottom of the stack (i. e. , the group with the largest 
distortion metric) was eliminated from further consideration. This 
modification was required by the finiteness of the memory of the 
computer. 

(c) Whenever step 3 of the stack algorithm was executed any 
multiple of 100,000 times, all path groups except the 32 deepest 
into the tree were eliminated from further consideration. This 
modification speeds search through the tree in the event that the 
encoding is taking too long. 

The stack algorithm simulation was found to give performance of 

the same order of magnitude as did the Viterbi algorithm simulation. 

It was run on the same 60 blocks of data of length 250 each which the 

Viterbi algorithm used. Parameters were b = 32 branches per node, 

29 

period r = 1, m = 2 branch states, and b Q = 32 initial states. Thus 

overall rate was again about 5 bits per source output. Distortion metric 

parameters A, B given by the limit of performance of the best possible 

coding were found to give the most efficient results. That is, a D* is 

2 

chosen and A, B are set at A = D*(l-D*), B = (D*) . Varying D* 

varies the distortion obtained and also the amount of search performed. 

The stack algorithm simulation just described was found to give 

-2R -2R 

overall distortions of D = 1.28 (2 ) and D = 1.25 (2 ) with searches 



of about 14 thousand and 23 thousand branches per datum respectively. It 
required about 7$ longer to search each branch than required in the 
Viterbi algorithm. 
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Figure Caption 

Fig. 1: Example of a partial coding tree of rate R = 2 for a 

Gaussian source with a square error fidelity criterion. 
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III-D Variable Length-to-Block Coding of 
Fixed Rate Sources 

There are two practical problems associated with noise- 
less source coding: (a) optimal codes require a codebook 
table look-up, (b) real-time variable length coding and real- 
time decoding data retrieval are both subject to buffer over- 
flow. A partial answer to problem (a) is Elias source coding 
as described in Appendix A of Jelinek: Probabilistic 
Information Theory . Problem (b) for block-to-var iable 
length coding has also been analyzed there. It is, however, 
of interest to analyze the buffer over-flow problem of 
variable length-to-block coding that assigns constant length 
codewords to variable length source output sequences. (It 
is thus a generalization of run length coding.) The reason 
is the word-like character of computer storage that makes 
retrieval of constant length codewords much easier. In a 
paper to be published in IEEE Transactions on Information 
Theory (the abstract can be found below) Schneider and Jelinek 

derive tight bounds on buffer overflow probabilities. For binary 
sources that are more skew than (0.8, 0.2), variable length-to- 
block coding leads to lower probabilities of buffer overflow than 
does the usual block-to-variable length coding. 



ON VARIABLE LENGTH-TO-BLOCK CODING* 

by 

K. Schneider, Member IEEE 
2 

F. Jelinek, Senior Member IEEE 
ABSTRACT 

Variable length -to -block codes are a generalization of run 
length codes. A coding theorem is first proven. When the codes 
are used to transmit information from fixed rate sources through 
fixed rate noiseless channels, buffer overflow results. The 
latter phenomenon is an important consideration in the retrieval 
of compressed data from storage. The probability of buffer 
overflow decreases exponentially with buffer length and we 
determine the relation between rate and exponent size for memoryless 
sources. We obtain codes that maximize the overflow exponent 
for any given transmission rate exceeding entropy, and present 
asymptotically optimal coding algorithms whose complexity 
grows linearly with codeword length. We compare error exponents 
corresponding to variable length-to-block, block-to-variable 
length, and block coding. 
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